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Executive  Sumiriciry 


The  primary  objective  of  this  project  is  to  develop  an  advanced  algorithm 
for  parallel  supercomputers  to  model  time-dependent  magnetohydrodynam¬ 
ics  (MHD)  in  all  three  dimensions.  This  will  provide  a  valuable  tool  for 
the  design  and  testing  of  plasma  related  technologies  that  are  important  to 
the  Air  Force  and  industry.  Implementing  the  algorithm  on  parallel  super¬ 
computers  will  allow  the  detailed  modeling  of  realistic  plasmas  in  complex 
three-dimensional  geometries. 

We  have  developed  a  time-dependent,  two-dimensional,  arbitrary-geometry 
version  of  the  algorithm,  placed  it  into  a  testbed  code,  added  the  modifica¬ 
tions  necessary  for  viscous  and  resistive  effects,  and  tested  the  code  against 
known  analytical  problems.  We  have  implemented  the  algorithm  on  a  paral¬ 
lel  architecture  and  investigated  parallelization  strategies.  Future  plans  in¬ 
clude  installing  the  algorithm  into  MACH2,  optimizing  the  parallelization, 
extending  the  code  to  three  dimensions,  installing  the  three  dimensional 
algorithm  into  MACH3[1],  and  calibrating  the  code  with  experimental  data. 

As  a  result  of  this  project  several  professional  collaborations  now  exist 
between  the  Department  of  Aeronautics  and  Astronautics  at  the  University 
of  Washington  and  the  Air  Force  Phillips  Laboratory,  Lawrence  Livermore 
National  Laboratory,  the  University  of  Michigan,  the  University  of  Colorado, 
aiid  other  departments  at  the  University  of  Washington.  The  work  from  this 
project  has  been  presented  at  international  conferences  and  one  publication 
has  already  been  published  in  a  refereed  journal  and  another  publication 
has  been  accepted  for  publication  pending  revisions. 

2  Project  Description 

Plasmas  are  essential  to  many  technologies  that  are  important  to  the  Air 
Force,  some  of  which  have  dual-use  potential.  These  applications  include 
nuclear  weapons  effects  simulations,  radiation  production  for  counter  pro¬ 
liferation,  fusion  for  power  generation,  and  advanced  plasma  thrusters  for 
space  propulsion.  In  general,  plasmas  fall  into  a  density  regime  where  they 
exhibit  both  collective  (fluid)  behavior  and  individual  (particle)  behavior. 
Many  plasmas  of  interest  can  be  modeled  by  treating  the  plasma  like  a  con¬ 
ducting  fluid  and  assigning  macroscopic  parameters  that  accurately  describe 
its  particle-like  interactions.  The  magnetohydrodynamic  (MHD)  model  is  a 
plasma  model  of  this  type. 
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2.1  Research  Objectives 

The  objectives  of  the  project  are  to: 

•  Develop  a  coupled,  implicit,  time-accurate  algorithm  for  three-dimensional, 
viscous,  resistive  MHD  simulations; 

•  Incorporate  the  algorithm  into  the  MACHS  code,  which  was  developed 
at  the  Air  Force  Phillips  Laboratory; 

•  Validate  the  code  with  analytical  and  experimental  data;  and 

•  Apply  the  code  to  analyze  plasma  experiments  at  the  University  of 
Washington  [Helicity  Injected  Tokamak  (HIT) [2]]  and  at  the  Phillips 
Laboratory  [the  liner  implosion  system  (WFX)[3],  the  dense  plasma 
focus  experiment,  and  magnetic  flux  compression  generators]. 

2.2  Technical  Description 
2.2.1  MHD  Plasma  Model 

The  three-dimensional,  viscous,  resistive  MHD  plasma  model  is  a  set  of 
mixed  hyperbolic  and  parabolic  equations.  The  Navier-Stokes  equations  are 
also  of  this  type.  This  project  applies  some  advances  that  have  been  made  in 
implicit  algorithms  for  the  Navier-Stokes  equations  to  the  MHD  equations. 
These  implicit  algorithms  solve  the  equation  set  in  a  fully  coupled  manner, 
which  generates  better  accuracy  than  the  current  methods  used  for  MHD 
simulations. 

When  expressed  in  conservative,  non-dimensional  form,  the  MHD  model 
is  described  by  the  following  equation  set. 


p 

pv 

d 

pv 

-kV- 

pvv  —  BB  -t-  (p  -f  B  •  B/2)  I 

dt 

B 

vB  -  Bv 

e 

(e  4-  p  +  B  •  B/2)  V  —  (B  •  v)  B 

0 

{ReAl)-^  f 
{RmAl)~^  H(^,  B) 

(ReAl)-'^  V  •  f  -  (RmAl)-'^  ^  ■  (V  x  B)  x  B  +  (PeAZ)”^  k  ■  VT 


(1) 


The  variables  are  density  (p),  velocity  (v),  magnetk  induction  (B),  pressure 
(p),  energy  density  (e),  and  temperature  (T).  H(p,B)  is  the  transverse 
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resistive  electric  field  tensor  which  is  described  in  Section  2.2.8.  Mi  is  the 
ion  mass.  The  energy  density  is 


e  = 


P 

7-1 


vv  BB 


(2) 


where  7  —  Cp/cv  is  the  ratio  of  the  specific  heats.  The  non-dimensional 
tensors  are  the  stress  tensor  (f),  the  electrical  resistivity  (^),  and  the  thermal 
conductivity  (k),  and  I  is  the  identity  matrix.  The  non-dimensional  numbers 
are  defined  as  follows: 


Alfven  Number  :  Al  =  VaIV 

Reynolds  Number  :  Re  =  LV/u 

Magnetic  Reynolds  Number  :  Rm  =  fioLVIrj 
Peclet  Number  :  Pe  =  LVJk 


(3) 


The  characteristic  variables  are  length  (L),  velocity  (F),  Alfven  speed  (Va  = 
B I ^yfJLoP)^  kinematic  viscosity  (z/),  electrical  resistivity  (77),  and  thermal  dif- 
fusivity  (k  =  kfpcp).  Po  is  the  permeability  of  free  space  {An  x  10”*^). 

For  convenience,  the  MHD  equation  set  [eqn(l)]  is  rewritten  in  the  fol¬ 
lowing  compact  form 

^  +  V.n  =  V.fp,  (4) 

where  Q  is  the  vector  of  conservative  variables,  Th  is  the  tensor  of  hyperbolic 
fluxes,  and  Tp  is  the  tensor  of  parabolic  fluxes.  The  forms  of  these  vectors 
and  tensors  can  be  seen  from  eqn(l).  The  hyperbolic  fluxes  are  associated 
with  wave-like  motion,  and  the  parabolic  fluxes  are  associated  with  diffusion¬ 
like  motion. 


2.2.2  Algorithm  Overview 

Because  of  the  natural  differences  between  hyperbolic  and  parabolic  equa¬ 
tions,  the  methods  for  solving  them  are  very  different.  Since  the  MHD 
equations  are  of  mixed  type  the  hyperbolic  and  parabolic  terms  must  be 
handled  differently.  The  hyperbolic  fluxes  are  differenced  by  applying  an 
implicit,  approximate  Riemann  algorithm  that  properly  accounts  for  their 
wave-like  behavior.  The  parabolic  terms  are  discretized  by  applying  explicit 
central  differencing. 

The  design  of  the  overall  algorithm  is  primarily  driven  by  the  numerical 
techniques  that  must  be  used  to  discretize  the  hyperbolic  terms.  Therefore, 
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we  begin  by  considering  the  ideal  MHD  equations,  which  are  obtained  from 
eqn(4)  by  setting  all  the  parabolic  terms  (Tp)  to  zero. 

In  one  dimension  they  are 


dt  dx  dt  dx 


(5) 


where  F  is  the  flux  vector  in  the  x  direction  (i.e.  Th  =  {F,  G,  H))  and  A  is 
its  Jacobian. 


(6) 


Here,  Q  is  the  vector  of  conserved  variables: 

Q  =  (p,  PVx,  pvy,  pVz,  By,  Bz,  ef  .  (7) 


This  is  a  set  of  hyperbolic  equations  and  thus  A  has  a  complete  set  of  real 
eigenvalues  given  by 

A  =  '^x  i  Vfast^  '^x  ^  ^slow^  '^x  i  Vax)  »  (8) 


where  Vfast  and  Vsiow  are  the  fast  and  slow  magnetosonic  speeds,  and  Vax 
is  the  Alfven  speed  based  on  the  x  component  of  the  magnetic  field.  These 
can  be  expressed  as 


cl  +  Vl  +  ^[cl  +  Vlf-AclVl^ 


(9) 


V^(ci  + 171)2  _  4,2^2^ 


(10) 


vL  = 


Pop' 


Here,  Cg  is  the  ion  sound  speed,  which  for  a  perfect  gas  is 


(11) 


c 


2 

s 


IP 

P  * 


(12) 


Information  propagates  along  characteristics  which  travel  at  wave  speeds 
given  by  the  eigenvalues.  Most  modern  numerical  techniques  for  solving 
hyperbolic  equations  are  based  upon  the  idea  of  splitting  the  fluxes  into 
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components  due  to  left  and  right  running  waves.  Then  each  part  of  the  flux 
can  be  differenced  in  an  upwind  manner,  which  greatly  reduces  numerical 
oscillations  and  stabilizes  the  solutions. 

It  is  well  known  that  if  a  hyperbolic  equation  is  solved  with  an  explicit 
scheme,  then  the  allowable  time  step  to  maintain  numerical  stability  is  given 
by  the  CFL  (Courant-Priedrichs-Lewy)  condition,  which  in  the  case  of  the 
ID  MHD  equations  is 


At  < 


Ax 

\Vx  +  ^astl 


(13) 


For  the  high  magnetic  fields  and  low  densities  common  in  many  plasma 
experiments,  the  fast  magnetosonic  speed  is  quite  high,  and  thus  the  time 
step  is  prohibitively  small.  We  are  often  interested  in  only  modeling  the 
physics  that  occurs  slower  than  Alfven  time  scales.  For  example,  it  can  be 
shown  that  resistive  tearing  modes,  which  are  important  in  studying  fusion 
plasmas,  evolve  on  a  time  scale  that  is  given  by [4] 


Ttearing  OC  Ta-  (14) 


ta  is  the  Alfven  time,  t,,  is  the  resistive  diffusion  time,  and  Lu  is  the 
Lundquist  number,  which  is  given  by 

Lu  =  ^^  RmAl.  (15) 

Ta 

If  Lu  is  10®,  which  is  typical  for  laboratory  plasmas  in  fusion  applications, 
the  resistive  tearing  time  is  approximately  4000  times  larger  than  the  Alfven 
time.  By  treating  the  hyperbolic  fluxes  implicitly  in  time,  the  stability 
restriction  on  the  time  step  is  removed,  and  the  solution  can  be  advanced  at 
the  larger  resistive  tearing  time  step.  This  is  our  motivation  for  proposing 
an  implicit  scheme. 

The  starting  point  for  deriving  the  algorithm  is  the  two-dimensional  ideal 
MHD  equations  in  Cartesian  form 


^  ^  ^  -n 

dt  dx  ^  dy 


(16) 


We  then  discretize  eqn(16)  using  first  order  Euler  time  differencing  to  get 

-  Q7i) 

^  ^  =  -Rij  (17) 
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where  R  is 


-  ^1+1/2, j  -  Fi-l/2,j  +  Gij+i/2  -  Gij_i/2.  (18) 

Note  that  in  this  equation  and  all  that  follow  the  grid  metric  terms  (cell 
areas  and  volumes)  are  omitted  for  clarity.  We  linearize  R  as  follows; 

'»•«.”+ (^)”  («r‘  -  <?t)  (15) 

Substituting  this  expression  back  into  eqn(17)  and  rearranging,  we  get 


Here  AQ  is  defined  as 


=  (21) 

The  left  hand  side  of  the  eqn(20)  is  an  implicit  operator  operating  on  AQ. 
It  is  a  large  banded  block  matrix.  In  three  dimensions,  it  is  an  {Imax  x 
Jmax  X  Kmax)  by  (Imax  X  Jmax  X  Kmax)  matrix,  where  Imax  is  the  number  of 
cells  in  the  x  direction,  etc.  It  is  quite  costly  to  invert  a  matrix  of  this  size 
directly.  We  choose  to  invert  it  using  an  approximate  factorization,  which 
can  be  done  more  efficiently.  When  solved  this  way,  eqn(20)  is  no  longer 
time  accurate.  However,  we  can  still  achieve  time  accuracy  with  this  type 
of  scheme  by  adding  the  time  derivative  of  Q  as  a  source  term  to  the  right 
hand  side  of  the  equation.  We  then  have 

(22) 


where 

The  r  in  eqn(22)  can  be  thought  of  as  a  pseudo  time  variable.  At  each 
physical  time  step,  eqn(22)  is  solved  iteratively  in  pseudo  time  until  the  left 
hand  side  vanishes.  When  the  solution  converges,  our  original  equation 


(24) 
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is  solved.  This  technique  is  known  as  dual  time-stepping.  [5]  Note  that  in 
eqn(23)  a  more  accurate  time  derivative  can  be  used  at  the  expense  of  the 
additional  memory  needed  to  store  the  Q  vectors  from  previous  time  steps. 

One  advantage  of  the  strategy  outlined  above  is  that  the  implicit  op¬ 
erator  and  the  right  hand  side  in  eqn(20)  are  decoupled.  The  structure 
of  the  matrix  no  longer  depends  on  the  details  of  the  discretization  of  the 
right  hand  side  fluxes.  In  the  following  sections  we  will  describe  the  relax¬ 
ation  scheme  that  is  used  to  iteratively  invert  the  implicit  operator  and  the 
approximate  Riemann  solver  that  is  used  to  form  the  right  hand  side  fluxes. 

2.2.3  LU-SGS  Relaxation  Scheme 

We  use  the  lower-upper  symmetric-Gauss-Seidel  (LU-SGS)  method  to  it¬ 
eratively  invert  the  implicit  operator.  [6]  To  derive  this  method,  we  first 
consider  the  following  first  order  accurate  flux-vector  splitting  of  R  (at  time 
level  n  -h  1)  in  eqn(17): 

Rii  =  +  GJ  -  Gt  +  Gy+,  -  G-.  (25) 

where  is  the  portion  of  the  F  flux  vector  corresponding  to  right-running 
waves,  and  F~  is  the  portion  corresponding  to  left-running  waves,  and 
and  G~  are  similarly  defined.  This  equation  is  linearized  to  obtain 

{!  -k  At  (4  -  AUj  +  A'+IJ  -  4  +  -  Kj-i  +  -  B-) } 

X  AQfj  =  -AtR2j 

(26) 


where  A'^  is  the  Jacobian  of  and  so  on.  We  approximate  these  Jacobians 
as 


A*  =  \(A  +  i>a) 

(27) 

a-  =  \(a-pa) 

(28) 

where  pA  is  the  maximum  eigenvalue  (spectral  radius)  of  A.  If  we  then 
iteratively  solve  this  simplified  implicit  operator  using  a  forward  Gauss- 
Seidel  sweep  followed  by  a  backward  sweep,  the  resulting  algorithm  can  be 
written  as 

+  At 

\pA  +  PB)l-AU^^-Btj_^] 

X  |l  -f  At  ^PA  +  pb)  I  +  4i,i  +  4i+i]  } 

X  AQf^.  =  -  [1  -k  At  {pA  +  pb)]  ^tR2j 

(29) 
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The  forward  sweep  is  equivalent  to  inverting  a  lower  block  diagonal  matrix 
[the  first  braced  term  in  eqn(29)],  and  the  backward  sweep  is  equivalent  to 
inverting  an  upper  block  diagonal  matrix  [second  braced  term  in  eqn(29)]. 
This  structure  leads  naturally  to  several  vectorization  and  parallelization 
strategies. 

If  we  sweep  through  the  computational  domain  along  lines  of  constant 
i+  j  (in  2-D),  each  term  along  these  lines  is  independent  of  the  others  and 
depends  only  on  data  that  has  already  been  updated  during  the  cmrent 
sweep.  This  type  of  fine  grain  parallelization  is  ideal  for  vector  computers. 
However,  that  degree  of  parallelism  is  not  efficient  for  parallel  computers 
because  the  extra  communication  time  between  processors  exchanging  data 
more  than  offsets  the  gain  in  computational  efficiency.  To  optimize  this 
algorithm  for  a  parallel  architecture,  we  need  to  break  up  the  computational 
domain  into  blocks  and  send  each  block  to  a  different  processor.  At  the 
boundaries  between  the  blocks,  we  reduce  the  data  dependency  between  the 
blocks  by  using  data  from  the  previous  time  step  along  the  block  boundaries. 
This  effectively  reduces  those  points  into  a  Jacobi  iteration.  However,  the 
interior  points  are  still  solved  with  a  Gauss-Seidel  iteration.  As  long  as 
the  blocks  are  large  enough  that  there  are  many  more  interior  points  than 
boundary  points,  then  the  overall  convergence  rate  is  approximately  the 
same  as  Gauss-Seidel. 

2.2.4  Approximate  Riemann  Solver 

The  fiiixes  on  the  right  hand  side  of  eqn(20)  are  discretized  using  a  Roe-type 
approximate  Riemann  solver.  [7]  In  this  method  the  overall  solution  is  built 
upon  the  solutions  to  the  Riemann  problem  defined  by  the  discontinuous 
jump  in  the  solution  between  each  pair  of  cells.  The  numerical  flux  for  a 
first-order  accurate  (in  space)  Roe-type  solver  is  written  in  symmetric  form 
as 

^{+1/2  —  2  (-^Hl  d"  “  2  ^  Qj)\^k  \  (30) 

where  Tk  is  the  right  eigenvector,  is  the  absolute  value  of  the 
eigenvalue,  and  Ik  is  the  k^^  left  eigenvector.  The  values  at  the  cell  interface 
(i4-l/2)  are  obtained  by  a  simple  average  of  the  neighboring  cells.  These  first 
order  accurate  upwind  fluxes  are  used  in  the  vicinity  of  sharp  discontinuities 
in  order  to  suppress  oscillations  in  the  solution.  We  achieve  a  globally  second 
order  accurate  solution  by  using  a  flux  limiter  that  modifies  the  first  order 
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flux  so  that  it  uses  second  order  central  differencing  in  smooth  portions  of 
the  flow.  We  are  using  a  minmod  limiter. [8] 

Once  the  eigenvalues  and  eigenvectors  are  obtained  and  properly  nor¬ 
malized  to  avoid  singularities,  it  is  relatively  straight-forward  to  apply  this 
scheme  to  the  one-dimensional  ideal  MHD  equations. [9,  10]  Unlike  for  the 
Euler  equations,  the  extension  to  more  than  one  dimension  is  non-trivial. 
The  reason  is  that  in  more  than  one  dimension,  the  Q  vector  must  in¬ 
clude  Bx  in  addition  to  the  other  magnetic  field  components.  (For  the 
one-dimensional  case  Bx  is  constant  by  virtue  of  V  •  B  =  0).  Since  the  j  x  B 
force  acts  perpendicularly  to  the  directions  of  j  and  B,  the  F  flux  vector 
has  a  zero  term  corresponding  to  Bx-  Thus,  the  Jacobian  matrix  of  F  is 
singular  and  has  a  zero  eigenvalue.  This  means  we  no  longer  have  a  complete 
set  of  physically  meaningful  eigenvectors.  Physically,  we  expect  information 
to  travel  either  at  the  fluid  velocity  or  at  the  fluid  velocity  plus  or  minus 
the  wave  speeds.  Simply  dropping  Bx  from  the  equation  set  is  not  a  viable 
option,  because  Bx  needs  to  change  in  order  to  maintain  V  •  B  =  0.  Powell 
et  al.,  recently  solved  this  problem  by  modifying  the  Jacobian  in  such  a  way 
as  to  change  the  zero  eigenvalue  to  Vx  (keeping  the  others  unchanged),  and 
then  adding  in  a  source  term  that  exactly  canceled  out  the  terms  introduced 
by  the  modified  Jacobian.  [11] 

The  source  term  is 


Sdiy  — 


p 

B 

V 

V  B 


VB 


It  is  proportional  to  the  divergence  of  B  and  thus  very  small. 


(31) 


2.2.5  General  Multiblock  Implementation 

General  multiblock  grid  capability  has  been  implemented  in  our  code.  This 
capability  allows  the  application  of  the  code  to  arbitrarily  complex  geome¬ 
tries  which  is  one  of  the  primary  project  objectives.  MACHS  also  uses  a 
multiblock  grid  though  not  as  general  as  the  one  implemented  here.  The 
grid  that  spans  the  physical  domain  of  interest  is  composed  of  blocks  which 
have  a  local  grid.  Blocks  share  faces  with  adjacent  blocks  thus  allowing 
information  to  pass  through  the  whole  domain.  The  only  restriction  is  that 
the  cells  on  a  block’s  face  must  map  one  to  one  to  the  cells  on  the  adja¬ 
cent  block’s  face.  An  advantage  of  this  approach  is  that  the  user  is  able  to 
generate  highly  orthogonal  grids  with  cell  aspect  ratios  close  to  unity  even 
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Figure  1:  Top  view  of  a  non-simply  connected  grid.  (Only  the  block  bound¬ 
aries  are  shown  for  clarity.) 


for  complex  geometries.  This  approach  is  readily  adaptable  to  paralleliza¬ 
tion  by  domain  decomposition  where  the  blocks  are  distributed  among  the 
processors  of  a  parallel  computer. 

An  advantage  of  multiblock  grids  is  the  flexibility  they  provide  for  com¬ 
plex  geometries.  This  flexibility  is  even  greater  if  there  is  no  limitation  to 
simply-connected  blocks.  See  Figure  1  for  an  example  of  a  non-simply  con¬ 
nected  block  grid.  This  means  that  an  arbitrary  number  of  blocks  may  meet 
at  a  block  a  vertex.  An  example  of  this  type  of  grid  is  illustrated  in  Section 
3.7. 

The  MHD  algorithm  is  now  nested  within  a  loop  over  grid  blocks.  Logic 
has  been  implemented  in  the  new  algorithm  that  allows  the  user  to  freely 
distribute  the  blocks  on  processors.  A  static  load  balancing  algorithm  can  be 
implemented  to  distribute  the  blocks  such  that  the  number  of  cells  on  each 
processor  is  approximately  equal.  This  strategy  balances  the  load  among  the 
processors  and  makes  the  most  efficient  use  of  the  parallel  computer.  The 
appropriate  relationship  between  the  blocks  is  maintained  by  a  connectivity 
matrix.  Blocks  sharing  a  face  transfer  data  either  by  simply  copying  if  they 
reside  on  the  same  processor  or  by  message  passing  if  they  reside  on  different 
processors. 

Multiblock  implementation  led  to  memory  management  issues  that  we 
solved  by  using  Fortran  90.  The  most  useful  feature  used  was  dynamic 
memory  allocation.  Dynamic  memory  allocation  allows  the  specification  of 
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array  sizes  at  run  time  and  leads  to  a  significant  memory  savings  compared 
to  static  memory  allocation,  where  a  maximum  overall  array  size  has  to  be 
specified  at  compile  time.  Memory  allocation  issues  are  exacerbated  in  the 
multiblock  algorithm  where  blocks  can  have  widely  varying  numbers  of  cells. 
While  most  Fortran  77  compilers  implement  dynamic  memory  allocation, 
the  implementations  are  not  standard.  Using  Fortran  90  we  maintained  the 
portability  of  the  code. 

2.2.6  Second-Order  Accurate  Boundary  Conditions 

For  simplicity  an  early  version  of  the  algorithm  used  first-order  accurate 
boundary  conditions.  Practically  this  meant  that  a  single  layer  of  ghost 
cells  were  used  for  all  boundaries.  This  arrangement  was  shown  to  lower 
the  overall  accuracy  of  the  algorithm.  We  have  since  implemented  second- 
order  accurate  boundary  conditions  by  using  two  layers  of  ghost  cells.  The 
second-order  accurate  boundary  conditions  ensure  that  our  simulations  are 
completely  independent  of  how  the  grid  is  decomposed. 

Since  the  code  is  three  dimensional  the  data  passed  between  blocks  is 
represented  as  a  pair  of  two  dimensional  structures.  Transfer  of  these  two 
dimensional  structures  by  message  passing  uses  derived  data  types.  The 
Message  Passing  Interface  (MPI)  provides  a  set  of  procedures  for  defining 
derived  data  types. [16]  A  derived  data  type  is  a  template  that  describes  how 
a  complex  structure  is  built  from  a  more  basic  data  type.  In  our  approach  the 
most  basic  unit  of  data  is  a  cell.  A  cell  contains  the  eight  conserved  variables 
mass  density  (p),  components  of  momentum  pvy^  pvz)^  components  of 
magnetic  field  {Bx,  By,  Bz)  and  energy  density  (e).  A  cell  can  be  thought  of 
as  a  zero  dimensional  data  structure.  A  strip  is  a  one  dimensional  structure 
made  of  cells.  A  face  is  a  two  dimensional  structure  made  of  strips.  Finally, 
a  double  face  is  made  of  two  adjacent  faces.  The  composition  of  the  derived 
data  type  is  illustrated  in  Figure  2. 

Use  of  derived  data  types  is  more  general  and  speeds  the  message  trans¬ 
fer,  compared  to  explicitly  constructing  the  messages.  Our  results  show  a 
three  to  ten  fold  improvement  in  message  passing  time  when  derived  data 
types  are  used. 

2.2.7  Unaligned  Finite  Volumes 

The  deficiency  revealed  by  the  spheromak  simulations  was  corrected  by  re¬ 
casting  the  algorithm  using  an  unaligned  finite  volume  formulation.  Using 
the  identical  finite  formulation  for  the  hyperbolic  fiuxes  as  for  the  parabolic 
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Figure  2:  Hierarchy  of  the  derived  data  types.  The  most  basic  derived  data 
structure  is  the  cell. 

fluxes  is  essential  to  achieving  our  objective  of  a  fully-coupled  MHD  code. 
The  previous  algorithm  used  a  generalized  coordinates  approach  to  calcu¬ 
late  the  fluxes  across  cell  interfaces.  [17]  These  fluxes  are  used  to  update  the 
solution  at  the  next  time  step.  The  flux  in  generalized  coordinates  is 

=  +  +  (32) 

where  F,  G  and  H  are  the  fluxes  in  Cartesian  coordinates,  n^,  riy  and  riz 
are  the  components  of  the  unit  normal  vector,  and  ^  =  1,2,3  stands  for  each 
of  the  three  generalized  coordinates  (C,77,C)-  These  generalized  coordinates 
fluxes  are  calculated  in  the  cells  to  the  left  and  right  of  the  interface. 
The  flux  at  the  interface  is  obtained  by  averaging  the  two. 

1  1  ^ 

+  ^i+i)  -  ^  (33) 

^  ^  fe=i 

where  ak  axe  the  wave  strengths,  Xk  are  the  wave  speeds,  and  are  the  right 
eigenvectors  of  the  flux  Jacobian.  The  eigenvectors  and  eigenvalues  of  the 
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Figure  3:  Schematic  drawing  of  a  finite  volume  cell  with  local  coordinate 
systems  at  its  interfaces.  The  fluxes  are  only  needed  along  the  Xi  directions. 


flux  Jacobian  {dJ^/dQ)  are  evaluated  using  average  values  of  the  conserved 
variables.  Presently  the  code  uses  arithmetic  averages. 

In  the  new  approach,  the  fluxes  at  an  interface  are  calculated  based  on  a 
locally  aligned  coordinate  system.  The  Riemann  problem  is  then  solved  at 
the  interface  in  a  natural  direction  to  generate  fluxes  that  are  automatically 
orthogonal  to  the  interface.  A  divergence  theorem  can  now  be  applied  to 
the  finite  volume  cell  to  calculate  the  change  in  the  conserved  variables.  See 
Figure  3  for  an  illustration  of  the  local  coordinate  systems.  The  unaligned 
flnite  volume  methods  is  exactly  the  same  method  that  has  already  been 
implemented  for  the  calculation  of  the  parabolic  fluxes.  Using  the  same 
approach  for  the  hyperbolic  fluxes  will  make  the  code  consistent. 

We  locally  rotate  the  coordinate  system  such  that  one  axis  is  normal 
to  the  interface.  The  Riemann  problem  is  solved  along  the  axis  normal  to 
the  face.  The  rotation  is  kept  consistent  by  performing  a  two  step  rotations 
about  the  Cartesian  coordinates.  We  calculate  the  angles  between  each  of 
the  original  axes  (x®,  y®,  z^)  and  the  normal  to  the  face.  The  axis  corre¬ 
sponding  to  the  minimum  angle  is  aligned  with  the  normal.  For  example, 
if  is  to  be  aligned  with  the  normal  then  the  first  rotation  is  about  the 

axis  with  an  angle  6i,  The  second  rotation  is  about  y',  with  an  angle 
02,  which  aligns  the  coordinate  system  with  the  interface  normal.  The  new 
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Figure  4:  Rotation  of  a  cell  coordinate  system  that  aligns  x°  with  the  normal 
to  the  cell  face  h. 


coordinate  axes  are  (xi,  yi,  zi).  For  this  case  the  angles  are 


Q\  =  atan 


Till 


Ux 


O2  —  —atari 


riz 


nj  +  n^ 


(34) 


and  the  rotation  matrices  are 


Rim  - 
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0 
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sinOi 

cos9i 
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,  R2i02)  = 
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1 
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—sin92 

0 

COs92 

.  (35) 


The  rotation  is  illustrated  in  Figure  4. 

Once  the  components  of  the  vector  fields  (v  and  B)  are  rotated  the 
Riemann  problem  is  solved  and  the  flux  along  the  normal  to  the  face  is 
calculated.  Then  the  conserved  variables  are  updated  and  the  vector  fields 
are  rotated  back  to  the  original  Cartesian  coordinate  system. 

Since  the  rotation  matrices  are  orthogonal  their  inverses  are  equal  to 
their  transposes  =  ItF)  so  that  the  extra  work  to  be  performed  by 

the  algorithm  at  each  cell  is  minimal  (27  floating  point  operations).  This 
method  is  currently  implemented  and  is  being  tested. 


2.2.8  Parabolic  MHD  Terms 

To  this  point  we  have  only  considered  the  hyperbolic  terms.  When  finite 
viscousity  and  resistivity  are  included,  the  parabolic  terms  of  the  MHD 
equations  [right  hand  side  of  eqn(l)]  become  important.  For  reasonably  large 
values  of  Re  and  Rm  (easily  in  the  range  of  interest  for  most  applications), 
the  parabolic  terms  can  be  differenced  explicitly  without  constraining  the 
allowable  time  step.  In  this  work  we  difference  the  parabolic  terms  explicitly 
in  time  with  central  differences  in  space.  They  are  added  to  the  right  hand 
side  fluxes  arising  from  the  approximate  Riemann  solver. 
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(a)  (b) 


Figure  5:  Positioning  for  (a)  vertex-centered  and  (b)  face-centered  parabolic 
fluxes.  The  face-centered  fluxes  produced  more  accurate  and  stable  solu¬ 
tions. 


During  the  past  year,  we  have  spent  a  concerted  effort  on  the  parabolic 
terms  to  achieve  accurate  and  stable  calculations.  Originally  we  used  the 
same  flux  centering  scheme  that  is  used  in  MACHS  where  the  fluxes  are 
calculated  at  the  cell  vertices  and  a  divergence  law  is  applied  around  the 
cell  center.  See  Figure  5(a).  A  detailed  stability  analysis  demonstrates  the 
potential  for  grid  decoupling  and  a  resulting  odd-even  instability.  [This  re¬ 
sult  has  important  implications  to  all  ALE  (arbitrary  Lagrangian-Eulerian) 
codes  and  will  soon  be  submitted  to  a  journal.]  Locating  the  parabolic 
fluxes  at  the  cell  faces  which  corresponds  to  the  location  of  the  hyperbolic 
fluxes  produced  solutions  that  converged  faster  and  were  more  accurate  than 
locating  the  parabolic  fluxes  at  the  cell  vertices.  See  Figure  5(b). 

We  also  point  out  that  the  resistive  electric  field  term  in  eqn(l)  is  differ¬ 
ent  than  the  one  commonly  used  and  presented  last  year  V  VB  which  does 
not  hold  for  spatially  dependent  anisotropic  resistivity.  Plasma  resistivity 
is  a  strong  function  of  temperature  and  of  the  orientation  to  the  magnetic 
field.  Therefore,  the  assumption  of  spatially  constant  isotropic  resistivity  is 
incorrect.  The  new  term  reduces  from  the  conservative  formulation  of  the 
more  general  equation. 
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(37) 


where  the  transverse  resistive  electric  field  tensor  is  defined  as 

^  r  0  riz{dxBy-dyBx)  rjyidxBz  -  dzBx) 

^  =  T)z  {dyBx  ~  dxBy)  0  Tjx  {dyBz  —  dzBy) 

_  r)y  {dzBx  “  dxBz)  Vx  {dzBy  -  dyBz)  0 

The  dimensionless  numbers  have  been  removed  for  clarity. 


3  Benchmarks  and  Applications 

3.1  Ideal  MHD  Test  Problems 

3.1.1  One-Dimensional  Coplanar  MHD  Riemann  Problem 

This  test  problem  served  to  validate  the  approximate  Riemann  solver,  be¬ 
cause  the  computed  solution  could  be  checked  against  the  exact  analytical 
solution.  For  the  one-dimensional  ideal  MHD  equations  (variations  in  x 
only),  the  equation  for  Bx  reduces  to  Bx  is  constant  and  drops  from  the 
equation  set,  eliminating  the  zero  eigenvalue  in  this  case.  The  coplanar  MHD 
equations  are  obtained  from  the  full  one-dimensional  ideal  MHD  equations 
by  setting  Bz  and  Vz  to  zero,  thus  allowing  only  planar  flow  and  fields.  This 
eliminates  the  Vx  ±  Vax  eigenvalues,  leaving  a  system  of  five  equations  with 
five  eigenvalues.  Mathematically,  the  Riemann  problem  is  an  initial  bound¬ 
ary  value  problem  in  which  there  is  initially  a  discontinuity  in  the  data  such 
that  the  left  half  of  the  domain  is  at  one  state  and  the  right  half  of  the 
domain  is  at  another  state.  As  the  solution  evolves  in  time,  shock  waves 
and  rarefaction  waves  form  and  travel  at  speeds  related  to  the  wave  speeds 
of  the  system.  Although  not  physically  realizable  in  plasmas,  this  problem 
is  analogous  to  a  shock  tube  in  hydrodynamics. 

For  the  full  five-wave  case,  there  is  not  a  closed  form  analytical  solution. 
Instead,  the  solution  must  be  checked  by  calculating  generalized  Riemann 
invariants  across  the  rarefaction  waves  and  Rankine-Hugoniot  jump  condi¬ 
tions  across  the  shock  waves.  Since  this  has  already  been  done  by  Brio  and 
Wu[9]  for  a  specific  set  of  conditions,  for  our  test  case  we  used  the  same 
initial  conditions  as  they  used  in  order  to  allow  direct  comparison  with  their 
published  solution.  The  initial  left  state  was  p  =  1,  p  =  1,  and  By  =  1.  The 
initial  right  state  was  p  =  0.1,  p  =  0.125,  and  By  =  —1.  The  velocities  were 
zero  and  Bx  was  0.75.  Figure  6  shows  the  initial  density  distribution  and 
its  numerical  solution  after  400  time  steps  on  an  800  point  grid  with  a  CFL 
number  of  0.8.  Figure  7  is  the  corresponding  plot  of  the  transverse  magnetic 
field  {By).  The  solution  was  computed  using  explicit  time-stepping.  The  so¬ 
lution  clearly  shows  five  waves  formed  corresponding  to  the  five  eigenvalues. 
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Figure  6:  Numerical  solution  of  coplanar  Riemann  problem.  Density  profile 
is  shown  initially  and  after  solution  has  evolved  for  400  time  steps. 


They  are  a  fast  rarefaction  wave,  a  slow  shock,  a  contact  surface  moving  to 
the  right,  a  slow  compound  wave  (rarefaction  and  shock),  and  a  fast  rar¬ 
efaction  wave  moving  to  the  left.  Note  that  the  numerical  method  is  able 
to  resolve  the  shocks  over  a  few  grid  points  without  introducing  numerical 
oscillations.  This  is  one  of  the  advantages  of  the  flux  splitting  approach 
we  have  used.  The  computed  solution  overlaid  exactly  on  Brio  and  Wu’s 
published  solution. 

If  we  set  jBx  =  0  above,  then  the  problem  reduces  to  a  hydrodynamic 
shock  tube  problem  if  one  replaces  the  thermodynamic  pressure  by  the  sum 
of  the  thermodynamic  and  magnetic  pressures.  For  this  case  one  can  find  a 
closed  form  exact  solution  to  compare  to  the  calculated  solution.  Figure  8 
shows  both  the  calculated  and  the  exact  solution  for  p  +  B^/2  after  80  time 
steps  on  a  100  point  grid.  There  is  very  good  agreement  with  the  plateau 
values  and  the  shock  is  resolved  in  a  few  cells  without  numerical  oscillations. 
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Figure  7:  Numerical  solution  of  coplanar  Riemann  problem.  Transverse 
magnetic  field  profile  is  shown  initially  and  after  solution  has  evolved  for 
400  time  steps. 

3.1.2  Oblique  Shock 

This  steady-state  problem  served  primarily  as  a  test  of  the  LU-SGS  implicit 
relaxation  scheme.  It  also  allowed  us  to  examine  the  divergence  of  B  at  each 
point  to  ensure  that  the  the  zero  eigenvalue  fix  was  correctly  implemented. 

The  geometry  for  these  tests  is  shown  in  Figure  9.  A  super- Alfvenic  flow 
(Mach  number  of  3)  impinges  on  a  perfectly  conducting  plate  at  an  angle 
of  25  degrees.  In  addition,  a  vertical  field  of  By  =  0,2  is  imposed  at  the 
left  boundary.  Since  the  plate  is  perfectly  conducting,  the  component  of  the 
magnetic  field  normal  to  the  plate  is  held  at  zero. 

Figure  10  shows  the  steady-state  solution  of  this  problem.  Contours  of 
density  and  magnetic  field  lines  are  plotted.  The  density  contours  show 
that  an  oblique  shock  forms,  as  expected.  Outside  of  the  shock,  the  field 
is  convected  in  from  the  boundary.  At  the  shock,  the  field  lines  bend  due 
to  the  change  in  direction  of  the  flow  at  the  shock.  Finally,  the  field  lines 
bend  at  the  conducting  wall  as  all  the  field  is  converted  to  Bx  to  satisfy  the 
boundary  condition  while  keeping  the  divergence  of  B  equal  to  zero.  We 
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Figure  8:  Comparison  of  numerical  and  exact  solution  of  coplanar  Riemann 
problem  for  Bx  —  0  case. 


verified  that  the  divergence  was  less  than  10”^^  throughout  the  domain. 

This  two-dimensional  steady-state  solution  was  obtained  with  explicit 
time  stepping  at  a  CFL  number  of  0.8  and  with  the  LU-SGS  implicit  relax¬ 
ation  scheme  at  an  infinite  CFL  number  (approximate  Newton  iteration). 
Figure  11  is  a  plot  of  the  logarithm  of  the  two-norm  of  the  residual  of  the  en¬ 
ergy  equation  as  a  function  of  the  number  of  iterations.  The  implicit  scheme 
converged  to  10”^"^  in  about  150  iterations,  whereas  the  explicit  scheme  re¬ 
quired  about  700  iterations.  This  is  an  acceleration  factor  of  about  4.5  for 
the  implicit  scheme.  Higher  acceleration  factors  can  be  achieved  for  finer 
grids. 


3.2  Viscous  and  Resistive  MHD  Test  Problems 

The  viscous  and  resistive  terms  in  the  MHD  equations  comprise  the  right 
hand  side  of  the  equality  in  eqn(l).  The  addition  of  these  terms  to  the 
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Figure  10:  Density  contours  and  field  lines  for  an  M  =  3  flow  impinging  on 
a  perfectly  conducting  plate  at  an  angle  of  25  degrees. 


algorithm  involved  the  modification  of  the  R  vector  in  eqn(20). 

-R^-R-\-V  -fp  (38) 

The  R  vector  is  updated  with  each  iteration  to  produce  a  solution  that  is 
fully  coupled. 

Using  the  divergence  form  of  the  parabolic  terms  reduces  the  differencing 
errors  of  the  method.  To  preserve  the  accuracy  on  irregular  meshes  the 
derivatives  are  computed  using  a  finite  volume  method. 

The  validation  of  the  parabolic  terms  consisted  of  applying  the  code  to 
a  suite  of  test  problems  with  known  analytical  solutions.  We  validated  in¬ 
dependently  the  terms  associated  with  viscosity  and  those  associated  with 
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Figure  11:  Logarithm  of  the  two-norm  of  the  energy  equation  residual  plot¬ 
ted  as  a  function  of  iteration  number  for  explicit  and  implicit  solutions  of 
channel  flow  with  horizontal  velocity  and  vertical  magnetic  field  imposed  at 
the  left  boundary. 

resistivity  and  then  the  combined  effect  of  all  of  the  terms.  The  test  prob¬ 
lems  were:  (1)  fully  developed  laminar  flow  between  two  parallel  plates,  (2) 
magnetic  field  generated  by  a  constant  current  density,  and  (3)  Hartmann 
flow.  All  of  these  test  problems  were  run  until  a  steady-state  solution  devel¬ 
oped.  The  capability  of  the  code  to  capture  time-dependent  physical  effects 
was  also  tested  by  modeling  the  exponential  resistive  decay  of  the  magnetic 
field  generated  in  test  problem  2. 

3.2.1  Laminar  Flow 

We  benchmarked  the  code  to  two  types  of  laminar  flows  between  infinite  par¬ 
allel  plates.  The  plates  restrict  the  steady-state  flow  to  be  one-dimensional. 
No  magnetic  fields  are  present.  This  reduces  the  MHD  equations  to  the 
Navier-Stokes  equations.  In  these  simulations  a  no-slip  boundary  condition 
was  applied  to  the  fluid  in  contact  with  the  plates. 

The  first  type  of  flow  to  which  we  benchmarked  was  viscous  flow  gen¬ 
erated  by  one  plate  moving  relative  to  the  other  plate.  With  no  pressure 
gradient,  constant  viscosity,  and  incompressible  flow,  the  equations  reduce 
to 


V  •  f  =  =  0  (39) 

which  is  Laplace’s  equation.  For  finite  viscosity  {Re)  the  analytical  solution 


21 


Figure  12:  Simulation  of  laminar  flow  between  parallel  plates  in  the  presense 
of  a  constant  pressure  gradient.  The  velocity  profile  is  parabolic  as  expected 
from  the  analytical  solution. 

for  the  flow  velocity  is 

My)  =  T^o(i-|)+VL|  (40) 

where  Vq  is  the  velocity  of  the  plate  at  y  =  0  and  Vl  is  the  velocity  of  the 
plate  at  y  =  L. 

The  errors  between  the  analytical  solution  and  the  code  generated  so¬ 
lution  were  below  10”^  (the  two-norm  of  the  error  between  the  solutions). 
We  performed  the  same  simulation  with  no  viscosity  {Re  — >  oo).  As  would 
be  expected,  the  flow  velocity  vanished  everywhere  except  on  the  plates. 
When  the  viscous  heating  was  modeled,  a  transient  pressure  gradient  p{y) 
and  transverse  velocity  Vy{y)  developed  which  heated  the  flow  and  increased 
its  energy. 

The  next  test  was  laminar  flow  between  stationary  parallel  plates  with 
a  constant  pressure  gradient  in  the  flow  direction.  The  governing  equation 
is 

Vx  •  (pi)  =  ^  =  (Re)-'^  (41) 

The  solution  for  this  flow  is  the  parabolic  equation  given  by 

Figure  12  shows  the  solution  generated  by  the  code.  Again  the  errors 
were  reduced  to  below  10“^. 
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3.2.2  Resistive  Diffusion 

We  benchmarked  the  resistive  diffusion  to  a  current  sheet  with  a  uniform 
current  density.  Values  of  the  tangential  magnetic  field  were  specified  at 
parallel  infinite  plates,  in  a  similar  way  as  the  first  of  the  laminar  flow 
simulations. 

For  no  flow  velocity  and  constant  resistivity  the  MHD  equations  reduce 
to  a  Laplace  equation  similar  to  eqn(  39). 

(Rm)-^  V  •  VB  =  0  (43) 


This  equation  has  the  same  form  for  its  solution  as  eqn(40). 

=  (44) 


where  Bq  is  the  velocity  of  the  plate  at  y  =  0  and  Bl  is  the  velocity  of  the 
plate  a,t  y  =  L. 

The  code  agreed  with  the  analytical  solution  to  within  errors  of  10"^. 

The  time-dependent  resistive  decay  of  a  magnetic  field  can  be  represented 
analytically  by  solving  the  one-dimensional  transverse  magnetic  induction 
equation  with  constant  resistivity. 


dB± 

dt 


=  {Rm)-^ 


d^B± 

dx^ 


(45) 


The  solution  is  the  exponential  decay  of  the  magnetic  field  with  a  sinusoidal 
profile. 


B±{t,x)  (X  exp 


sin(7rx) 


(46) 


for  zero  field  boundary  conditions  at  x  =  0  and  x  =  1. 

This  simulation  was  performed  beginning  with  a  uniform  field  profile. 
The  field  decayed  into  the  expected  sinusoidal  shape  and  the  decay  constant 
agreed  with  the  analjdical  result  to  within  0.01%.  The  same  test  was  re¬ 
peated  on  a  parabolic  clustered  grid  with  Axmax/^Xmin  =  10.  The  same 
accuracy  was  achieved. 


3.2.3  Heu'tmann  Flow 

Hartmann  flow  combines  the  effects  of  viscosity  and  resistivity.  The  problem 
geometry  is  the  same  as  that  for  the  laminar  flow  with  the  addition  of  a 
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Figure  13:  The  Hartmann  flow  geometry  showing  the  moving  parallel  plates 
and  the  cross  magnetic  field. 


magnetic  field  that  is  normal  to  the  plates,  in  the  y  direction.  See  Figure  13 
for  an  illustration. 

The  governing  equations  for  the  Hartmann  flow  can  be  found  by  com¬ 
bining  the  magnetic  field  and  momentum  equations  from  the  MHD  model. 
As  before  there  will  only  be  flow  in  the  x  direction.  However,  an  applied 
electric  field  in  the  z  direction  must  be  included  since  it  can  generate  an 
Eo  X  Bo  flow  in  the  x  direction.  The  Hartmann  flow  is  described  by  the 
differential  equation. 


d'^Vx 


(47) 


where  the  Hartmann  number  is  defined  as 

H  =  =  AlLVReRm.  (48) 


The  analytical  solution  to  the  Hartmann  flow  is 


Vxiy)  =  Vo 
Bo  . 


sinh  {H{1  -  y/L))  smh{Hy/L) 
sinh(i7)  ^  sinh(H) 

sinh  (i?(l  —  y/L))  +  sinh(ify/L) 
sinh(Ff) 


(49) 


where  the  same  no-slip  boundary  conditions  have  been  applied.  In  the 
limit  of  no  magnetic  fleld,  the  solution  reduces  to  the  laminar  flow  solu¬ 
tion,  eqn(40). 

The  response  of  the  magnetic  fleld  can  be  determined  by  solving  the 
magnetic  fleld  equation  for  the  field  component  that  will  be  “dragged”  with 
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Figure  14:  Hartmann  flow  simulation  with  H  =  10'*.  Flow  velocity  vectors 
and  magnetic  field  lines  are  shown.  The  velocity  of  the  flow  is  zero  every¬ 
where  except  at  the  plates.  The  magnetic  field  lines  have  a  constant  slope 
through  the  domain. 


the  flow.  This  magnetic  field  is  described  by 


^  =  +  (50) 

Using  the  flow  solution  of  eqn(49),  the  solution  for  Bx  is 


/  Rm\ 

(Vl-Vo\ 

cosh(jH/2)  —  cosh.  {H{L  —  2y)/2L) 

[ir) 

V  2  ) 

sinh(H/2) 

The  boundary  conditions  are  that  Bx  vanishes  at  the  plates  and  the  net 
current  is  zero.  The  first  boundary  condition  may  seem  arbitrary,  but  it  is 
consistent  with  the  no-slip  boundary  condition  applied  to  the  flow  solution. 
The  second  boundary  condition  relates  the  applied  electric  field,  Eq^  and  the 
plate  velocities. 

Eg  ^  Vb  +  Vl 

Bo  2 

Since  the  MHD  equation  set  does  not  allow  for  an  applied  electric  field,  Vq 
is  set  to  so  that  Eq  =  0. 

We  performed  simulations  for  large,  small,  and  intermediate  Hartmann 
numbers,  H. 
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Figure  15:  Hartmann  flow  simulation  with  H  =  0.1.  Flow  velocity  vectors 
and  magnetic  field  lines  are  shown.  The  velocity  profile  is  linear  and  the 
magnetic  field  lines  have  an  “S”  shape  caused  by  the  bulk  fluid  flow. 


For  a  large  Hartmann  number,  the  effects  of  viscosity  and  resistivity  are 
small,  and  the  solution  approaches  that  of  ideal  MHD.  The  flow  velocity 
vanishes  everywhere  except  on  the  plates,  like  it  does  for  the  inviscid  case 
{Re  — >  oo).  The  magnetic  field  is  frozen  into  the  plates  and  develops  a 
slope  (constant  Bx)  as  the  plates  move.  The  slope  of  the  magnetic  field  is 
determined  by  the  value  of  H  (the  field  lines  slip  through  the  plates  due 
to  resistivity).  The  slope  of  the  magnetic  field  lines  {Bo/Bx)  is  constant 
at  H/Rm.  Figure  14  shows  the  results  from  simulation  with  H  —  10"^.  A 
finite  value  of  the  flow  velocity  exists  only  at  the  plates.  The  magnetic  field 
lines  are  straight  except  at  the  plates  where  Bx  is  forced  to  vanish  because 
of  the  boundary  conditions.  For  clarity  the  slope  of  the  magnetic  field  has 
been  normalized  to  unity  at  the  midplane  between  the  plates  for  all  of  the 
Hartmann  flow  simulations. 

The  limiting  case  of  small  Hartmann  number  is  characterized  by  a  flow 
that  is  dominated  by  viscous  effects  and  a  magnetic  field  that  responds  to 
the  bulk  fluid  flow  and  the  large  resistivity.  The  flow  velocity  varies  linearly 
from  the  velocity  of  the  top  plate  to  the  velocity  of  the  bottom  plate,  as 
described  by  eqn(40).  The  magnetic  field  diflFuses  through  the  plate  and 
the  bulk  fluid,  but  the  fluid  drags  the  field  lines  along  with  the  flow.  This 
produces  a  swayed  “S”  shape  to  the  field  lines  with  a  peak  magnetic  field 
at  the  midplane.  Since  the  slope  of  the  field  lines  is  inversely  proportional 
to  the  magnitude  of  Bx^  the  peak  in  the  magnetic  field  corresponds  to  the 
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Figure  16:  Hartmann  flow  simulation  with  H  =  10.  Flow  velocity  vectors 
and  magnetic  field  lines  are  shown.  The  flow  velocity  only  exists  close  to 
the  plates.  The  magnetic  field  lines  are  linear  around  the  midplane. 

field  lines  with  the  minimum  slope  (most  horizontal).  The  minimum  slope 
is  AfRm.  The  simulation  results  for  H  =  0.1  are  shown  in  Figure  15.  Notice 
the  linear  velocity  profile  and  the  swayed  magnetic  field  lines. 

Flows  with  Hartmann  numbers  in  the  intermediate  ranges  have  solutions 
which  exhibit  characteristics  of  both  of  the  limiting  cases.  The  flow  velocity 
falls  to  zero  away  from  the  boundaries  in  a  scale  length  of  L/H.  This 
scale  length  is  an  appreciable  fraction  of  the  domain.  The  magnetic  field  is 
influenced  by  the  motion  of  the  plates  and  the  fluid  flow.  The  magnetic  field 
has  a  swayed  shape  close  to  the  plates  and  is  linear  around  the  midplane. 
Away  from  the  boundaries  {L/H  <  y  <  L—L/H)^  the  value  of  Bx  is  constant 
at  BoRm/H.  Figure  16  shows  the  results  from  a  simulation  with  H  =  10. 
The  velocity  profile  falls  to  zero  around  the  midplane.  The  magnetic  field 
lines  have  a  swayed  shape  like  those  in  Figure  15  but  not  as  dramatic,  and 
they  are  linear  around  the  midplane. 

All  of  the  Hartmann  flow  simulations  converged  to  the  analytical  solution 
to  within  errors  of  lO*^®. 

3.3  MPD  Plasma  Thruster 

The  magnetoplasmadynamic  (MPD)  thruster  is  an  electric  propulsion  device 
for  spacecraft.  Electrical  propulsion  is  a  technological  field  that  is  impor¬ 
tant  to  the  Air  Force  and  industry  for  satellite  station  keeping  and  orbital 
maneuvering.  This  problem  demonstrates  the  dual  time-stepping  algorithm, 
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Figure  17:  Geometry  of  the  two-dimensional  MPD  thruster. 


which  allows  flexible  choice  of  time  steps  so  that  fast  and  slow  transients 
can  be  tracked  accurately  and  efficiently.  This  is  also  the  first  problem  that 
exercises  all  of  the  parts  of  the  new  algorithm  (the  approximate  Riemann 
solver,  the  LU-SGS  relaxation  scheme,  the  resistive  and  viscous  terms,  and 
the  dual  time-stepping)  simultaneously.  The  problem  geometry  is  shown 
in  Figure  17.  A  current  is  applied  across  the  left  boundary.  This  current 
creates  a  magnetic  field  in  the  2:  direction  that  in  turn  leads  to  a  j  x  B  force 
that  accelerates  the  plasma  to  the  right.  We  expect  that  the  plasma  initially 
in  the  domain  will  be  accelerated  up  to  some  exit  velocity  on  a  fast  time 
scale  related  to  the  Alfven  time.  However,  if  there  is  a  finite  resistivity  in 
the  plasma,  the  magnetic  field  and  current  at  the  left  boundary  will  diffuse 
into  the  domain  on  a  slower  time  scale  related  to  the  resistive  diffusion  time. 
Ideally,  one  would  like  to  take  small  time  steps  initially  to  follow  the  fast 
transient,  and  then  switch  to  a  much  larger  time  step  when  the  system  is 
evolving  more  slowly. 

If  there  is  no  viscosity,  then  the  problem  becomes  one-dimensional  in  rr, 
which  is  to  the  right  in  Figure  17.  For  this  problem  we  chose  a  Lundquist 
number  of  100,  a  reference  magnetic  field  of  1  Tesla,  a  reference  density 
of  10“^  kgjrr?,  a  reference  length  of  10  cm,  and  an  imposed  current  of 
30  kA.  Figure  18  shows  the  plasma  velocity  as  a  function  of  x  at  several 
different  times  (normalized  to  the  Alfven  time).  The  top  plot  shows  the 
results  of  an  explicit  time-differencing  simulation  with  a  CFL  number  of 
1.  This  simulation  took  2600  time  steps  to  advance  the  solution  to  t  = 
10.17.  Notice  that  between  3  and  5  Alfven  times,  the  velocity  reaches  a 
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Figure  18:  Plasma  velocity  as  a  function  of  x  and  time  for  explicit  time 
differencing  simulation  and  dual  time-stepping  (implicit)  simulation. 


constant  uniform  value  along  the  length  of  the  domain.  The  bottom  plot  is 
a  simulation  in  which  a  CFL  number  of  1  was  used  until  t  =  1.5,  at  which 
point  the  CFL  number  was  increased  to  100  and  the  dual  time-stepping 
implicit  method  was  used  to  maintain  stability.  At  each  physical  time  step 
it  took  about  30  pseudo-time  steps  to  converge,  so  the  overall  number  of 
iterations  was  reduced  to  1090  for  the  dual  time-stepping  case.  The  plots 
look  similar  to  the  explicit  time-stepping  results,  except  that  the  end  of 
the  fast  transient  is  filtered  out  by  taking  such  large  time  steps.  On  the 
other  hand.  Figure  19  shows  that  the  magnetic  field,  which  evolves  on  the 
slower  resistive  diffusion  time  scale,  is  captured  equally  well  by  the  explicit 
and  implicit  schemes.  The  development  of  the  plasma  velocity  and  internal 
magnetic  field  can  be  seen  in  Figures  20  and  21. 
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Pseudo  Time  Iterations,  CFL  No.  =  100 


Figure  19:  Magnetic  field  as  a  function  of  x  and  time  for  explicit  time 
differencing  simulation  and  dual  time-stepping  (implicit)  simulation. 

3.4  Magnetic  Reconnection 

In  this  application  we  present  results  demonstrating  agreement  between  the¬ 
oretical  linear  growth  rates  of  the  resistive  instability  in  a  sheet  pinch  and 
our  non-linear  resistive  MHD  code.  We  study  resistive  instabilities  because 
they  axe  a  likely  candidate  for  driving  magnetic  relaxation  in  the  Helic- 
ity  Injected  Tokamak  (HIT).  The  planar  sheet  pinch  is  a  well  understood 
configuration[12,  13,  14,  15]  and  provides  a  good  test  problem  and  bench¬ 
mark  for  our  MHD  code. 

We  present  the  linear  analysis  of  the  sheet  pinch.[13,  14]  The  linear 
equations  are  solved  numerically  to  obtain  the  eigenmodes.  The  eigenvalues 
(growth  rates)  are  compared  with  the  analytical  theory.  [12]  We  then  present 
the  nonlinear  analysis  where  our  implicit  MHD  code  is  applied.  A  pertur¬ 
bation  is  initialized  in  the  MHD  code.  The  instability  resulting  from  the 
perturbation  is  allowed  to  develop  and  finally  saturates  due  to  non-linear 
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Plasma  Gun  Simulation  -  Velocity  Vector  Plot 


Applied  current:  1^  =  30,000 
’Reservoir’  density:  p,=  10 
I7d  =  5 

Processor  grid:  4x8 
Grid  size:  400  x  80 


Figure  20:  Velocity  vectors  for  the  MPD  thruster.  The  plasma  is  accelerated 
down  the  gun  by  the  I  x  B  force  and  a  boundary  layer  develops.  The  internal 
blocks  illustrate  the  decomposition  of  the  domain  used  for  the  validation  of 
the  parallel  version  of  the  code. 

effects.  The  initially  linear  growth  rate  agrees  with  linear  analysis. 

3.4.1  Problem  Description 

We  study  the  resistive  instability  in  a  planar  sheet  pinch,  the  symmetric 
tearing  mode  in  a  finite-thickness  current  sheet.  See  Figure  22  for  schematic 
representation.  For  simplicity  we  examine  the  mode  with  the  wave  vector 
parallel  to  the  equilibrium  magnetic  field. 

k  II  Bo  (53) 


We  define 


(54) 
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Plasma  Gun  Simulation  -  Contour  Plot 


Applied  current:  1^  =  30,000 
’Reservoir’  density:  p,=  10 
Lyd  =  5 

Processor  grid:  4x8 
Grid  size:  400  x  80 
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Figure  21:  Magnetic  field  (Bz)  contours  for  the  MPD  thruster.  The  gradient 
in  the  magnetic  field  produces  the  force  applied  to  the  plasma. 

where  o  is  the  characteristic  width  of  the  current  sheet.  The  resistivity  of 
the  current  sheet  is 

-^=cosh2(^)  (55) 

Tjref  \(X' 

which  satisfies  the  equilibrium  induction  equation  with  no  flow.  The  resis¬ 
tivity  has  a  minimum  in  the  middle  of  the  current  sheet  (y  =  0),  and  the 
magnetic  field  vanishes  at  y  =  0  and  is  positive  for  y  >  0  and  negative  for 
y  <  0.  See  Figure  23  for  the  equilibrium  profiles. 

3.4.2  Linear  Analysis 

For  the  linear  analysis,  we  begin  with  the  incompressible,  resistive  MHD 
equations.  We  assume  a  variation  of  the  perturbations  of  the  form 

/  =  /(!/,t)e“*.  (66) 
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Figure  22:  Schematic  of  planar  sheet  pinch  problem  [from  H.  P.  Furth, 
Phys.  Fluids  28(6),  1595  (1985)]. 


The  perturbation  equations  yield  a  pair  of  coupled,  linear  differential  equations.  [14] 

=  ”  (0  -  (”> 


dt 


d  fd^W  2  \  2r  2  2t\ 


d'^F 

dy^ 


where  Lu  is  the  Lundquist  number  and 

~  Bo 


(58) 


(59) 


W  =  —ikTrVxl 


(60) 


a  =  ka 


(61) 


Lurji 


(62) 


This  coupled  pair  of  PDE’s  are  solved  numerically  using  an  implicit  finite 
difference  formulation.  The  eigenfunctions  for  Lu  =  10^  and  a  =  0.5  are 


33 


Figure  23:  Equilibrium  profiles  of  normalized  magnetic  field  and  resistivity. 

shown  in  Figure  24.  The  growth  rates  have  also  been  found  analytically.  [12] 
For  the  pure  symmetric  tearing  mode  the  growth  rate  is  given  by 

7  =  0.954(1  -  '  .  (63) 

For  values  of  Lu  greater  than  500,  the  numerically  calculated  and  analytical 
linear  growth  rates  agree  to  within  a  few  percent. 

3.4.3  Non-Linear  Analysis 

A  planar  current  sheet  is  initialized  in  the  code,  and  a  perturbation  of  the 
same  form  as  the  linear  eigenfunction  is  superimposed.  The  growth  rate  is 
determined  from  the  amount  of  reconnected  flux  at  ?/  =  0.  The  evolution 
of  the  non-linear  perturbation  is  shown  in  Figure  25.  The  result  from  the 
linear  analysis  is  also  shown.  The  non-linear  growth  matches  the  linear 
prediction  during  early  development  of  the  instability,  but  during  late  time 
the  instability  saturates  due  to  non-linear  effects.  Magnetic  flux  contours 
are  shown  in  Figure  26  which  show  the  magnetic  island  formation  of  the 
non-linear  instability. 

3.5  HIT  Injector 

One  of  the  flrst  applications  for  the  new  code  will  be  to  simulate  the  HIT 
experiment.  The  experiment  is  shown  schematically  in  Figure  27.  The 
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Figure  24:  The  eigenfunctions  for  Lu  =  10^  and  a  =  0.5. 


geometry  is  toroidal,  but  only  a  single  slice  in  the  poloidal  plane  is  pictured. 
HIT  is  a  low  aspect  ratio  tokamak  that  uses  helicity  injection  to  produce 
toroidal  current.  Gas  is  puffed  into  the  injector  and  then  a  series  of  capacitor 
banks  are  discharged  across  the  electrodes  to  form  the  plasma  and  interact 
with  the  applied  magnetic  fields  to  push  the  plasma  into  the  confinement 
region,  where  the  tokamak  plasma  is  formed.  The  full  simulation  will  require 
three-dimensional,  multiblock  capability,  which  the  code  does  not  yet  have. 
However,  the  injector  portion  of  the  experiment  can  be  modeled  as  a  single 
block. 

For  the  first  simulation  we  made  the  further  simplification  of  solving 
the  two-dimensional  problem  rather  than  the  true  cylindrical  problem.  We 
chose  an  initial  poloidal  bias  field  that  is  uniform  throughout  the  injector 
with  By  —  0.00 IT  and  Bx  =  0^  where  the  x  direction  is  up  (toward  the  con¬ 
finement  region)  in  Figure  27.  The  initial  toroidal  (out  of  plane)  field  was 
zero  in  this  case,  and  a  current  of  30kA  was  applied  across  the  electrodes 
(at  the  bottom  boundary  in  Figure  27).  Plasma  was  placed  at  the  bottom 
boundary  with  a  density  ten  times  higher  than  the  initial  background  den¬ 
sity.  The  Lundquist  number  was  1000.  A  grid  with  44  cells  in  the  x  direction 
and  12  points  in  the  y  direction  was  used. 

The  results  of  the  simulation  after  ten  Alfven  times  are  shown  in  Figure 
28.  Contours  of  density  and  the  in-plane  magnetic  field  lines  are  plotted. 
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Figure  25:  The  linear  and  non-linear  evolution  of  the  reconnected  flux. 


The  current  at  the  left  boundary  {x  —  0)  induces  out-of-plane  magnetic  field 
that  results  in  a  j  x  B  force  that  brings  in  plasma  from  the  left  and  pushes 
the  plasma  to  the  right  (in  the  x  direction)  towards  the  confinement  region. 
The  contours  of  density  show  the  higher  density  plasma  being  carried  into 
the  domain.  In  addition,  the  initially  straight  field  lines  are  stretched  as 
the  plasma  flows  across  them.  However,  the  density  contours  do  not  overlay 
with  the  field  lines  as  they  would  in  the  limit  of  zero  resistivity.  This  is 
consistent  with  the  relatively  low  Lundquist  number  for  this  simulation. 

3.6  Three-Dimensional  Magnetic  Relaxation 

We  have  performed  a  preliminary  study  of  three-dimensional  magnetic  re¬ 
laxation  in  toroidal  configurations  using  the  code.  These  configurations  are 
similar  to  the  previous  compact  toroid  experiment  (MARAUDER)  at  the 
Phillips  Laboratory.  [18]  The  major  exception  is  that  the  geometry  used  here 
is  a  toroid  with  a  square  cross  section  and  smaller  aspect  ratio.  An  article 
describing  this  application  will  be  published  by  the  Maui  High  Performance 
Computing  Center  (MHPCC)  as  a  HPC  Success  Story. 

The  plasma  is  initialized  with  uniform  density  and  pressure  and  a  core  of 
purely  toroidal  field  surrounded  by  purely  poloidal  field.  This  corresponds  to 
a  current  sheet  around  the  toroidal  field.  The  fields  strengths  are  adjusted  so 
the  forces  balance  at  the  current  sheet.  In  toroidal  coordinates,  the  toroidal 
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Figure  26;  Flux  contours  of  the  developed  non-linear  instability. 


field  is 

Bt  =  (64) 

and  the  poloidal  field  is 

where  R  is  the  major  radius,  r  is  the  minor  radius,  and  Bo  is  maximum  mag¬ 
netic  field  value.  Because  the  plasma  geometry  has  a  square  cross  section, 
the  poloidal  field  beyond  r  =  a/2  was  set  to  zero  to  satisfy  the  divergence- 
free  condition  on  the  magnetic  field. 

We  know  the  final  plasma  configuration  based  on  energy  minimization. 
This  configuration  is  called  a  Taylor  state.  [19]  Taylor  theorized  that  the 
magnetic  field  will  rearrange  or  relax  through  tearing  and  reconnection  to 
arrive  at  the  lowest  energy  state  while  maintaining  the  total  magnetic  helic- 
ity.  Helicity  is  the  amount  of  magnetic  flux  linkage.  For  this  geometry,  the 
Taylor  state  has  an  analytical  form. 


Br  —  Bokz 


JlikrV) 


Ji(A) 

Yi(A) 


Yi(A:,r) 


cos{kzz) 


(66) 
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Figure  27:  Schematic  of  the  HIT  plasma  experiment. 


Be  =  BoVW+M 
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where  Jm  and  Ym  are  the  ordinary  Bessel  functions,  and  kr^  kz^  and  A  are 
chosen  to  satisfy  the  boundary  conditions. 

While  Taylor’s  theory  agrees  with  most  experimental  data,  the  mecha¬ 
nism  for  relaxation  is  not  imderstood.  If  the  relaxation  involves  global  tear¬ 
ing  and  reconnection,  the  process  may  destroy  the  plasma.  The  simulations 
that  we  performed  are  providing  great  insight  into  the  exact  mechanisms 
and  their  dependence  on  the  magnetic  Reynolds  number. 

The  simulation  results  for  a  compact  toroid  with  a  magnetic  Reynolds 
number  of  10^  and  an  aspect  ratio  of  1.5  are  shown  in  Figure  29  after  the 
plasma  has  evolved  for  10  Alfven  transit  times.  The  formation  of  a  toroidal 
perturbation  with  a  primary  mode  number  of  3  can  be  seen  in  the  contours  of 
the  poloidal  flux.  Furthermore,  the  mode  is  localized  on  a  magnetic  surface 
{q  =  1).  Interestingly  the  perturbation  saturates  and  does  not  destroy  the 
plasma.  This  mode  is  a  strong  candidate  for  the  mechanism  responsible  for 
magnetic  relaxation. 
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Figure  28:  Results  of  two-dimensional  simulation  of  HIT  injector.  Plot 
shows  density  contours  and  poloidal  magnetic  field  lines. 


When  the  magnetic  Reynolds  number  is  decreased,  the  toroidal  mode 
structure  weakens  and  the  simulations  are  almost  axisymmetric.  This  be¬ 
havior  is  due  to  the  larger  resistive  diffusion  that  occurs  at  the  lower  mag¬ 
netic  Reynolds  numbers.  Instead  of  magnetic  tearing  and  reconnection, 
the  magnetic  fields  merely  diffuse  through  each  other  in  a  symmetric  man¬ 
ner.  Results  are  shown  in  Figure  30  for  a  compact  toroid  with  a  magnetic 
Reynolds  number  of  10^  and  an  aspect  ratio  of  1.5.  The  results  corresponds 
to  the  parameters  of  a  cold  plasma. 

Other  simulations  were  performed  at  larger  aspect  ratios.  The  larger 
aspect  ratios  simulations  showed  a  similar  mode  structure  but  with  a  higher 
toroidal  mode  number.  This  phenomena  is  still  under  investigation  and  will 
be  submitted  for  publication. 

3.7  Nonlinear  Tilt  Instability  in  the  Spheromak 

There  is  a  renewed  interest  in  “alternative”  plasma  confinement  concepts. 
One  of  these  concepts  is  the  spheromak  which  is  a  toroidal  plasma  confine¬ 
ment  concept  where  no  materials  such  vacuum  vessels  or  magnetic  field  coils 
link  the  toroid.  [20]  Spheromaks  were  first  studied  by  astrophysicists.  They 
are  interesting  since  they  are  force-free,  simple  structures  with  closed  flux 
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Figure  29:  Contours  of  poloidal  flux  showing  the  toroidal  mode  structure  of 
a  relaxing  compact  toroid  with  a  magnetic  Reynolds  number  of  10^  after  10 
Alfven  transit  times. 

surfaces  towards  which  space  plasmas  tend  to  evolve. 

Plasma  stability  is  a  major  issue  for  spheromaks.  Stability  of  spheromaks 
has  been  first  studied  by  Rosenbluth  and  Bussac  in  a  spherical  configuration.  [21] 
Using  linear  theory  for  a  force- free  equilibrium  (j  =  AB,  where  A  is  indepen¬ 
dent  of  position),  they  showed  that  an  oblate  spheromak  is  stable  against 
all  internal  modes  if  surrounded  by  a  closed  fitting  conducting  shell.  The 
plasma  is  unstable  if  the  boundary  is  prolate. 

Finn  and  Manheimer[22]  and  Bondeson  et  aZ.,[23]  studied  the  tilt  in¬ 
stability  of  spheromaks  in  a  cylindrical  geometry.  The  tilt  instability  is  a 
relaxation  to  a  minimum  energy  state  during  which  the  magnetic  axis  of  the 
spheromak  tilts.  In  both  papers  cited  above  the  authors  used  linear  theory 
and  found  that  for  aspect  ratios  {L/R)  less  than  1.67  the  spheromaks  in 
cylindrical  geometry  are  stable  to  tilt.  However,  some  experiments  showed 
that  oblate  spheromaks  still  tilted.  [24] 

The  goals  of  our  study  were  to  validate  the  code  against  theoretical  and 
experimental  results  obtained  for  a  tilting  prolate  spheromak  and  under¬ 
stand  why  oblate  spheromaks  tilt. [25]  To  test  our  code  we  tried  to  match 
the  growth  rate  obtained  with  a  linear  stability  code  described  by  Shumlak 
et  al[2Q]  A  small  perturbation  to  the  spheromak  equilibrium  should  grow 


Figure  30:  Contours  of  poloidal  flux  showing  only  a  weak  toroidal  mode 
structure  for  a  relaxing  compact  toroid  with  a  magnetic  Reynolds  number 
of  10^  after  10  Alfven  transit  times. 

initially  at  the  linear  growth  rate.  Eventually  nonlinear  effects  would  satu¬ 
rate  the  growth  of  the  mode. 

For  our  simulation  we  selected  an  aspect  ratio  L/R  —  "i  where  linear 
theory  showed  that  the  growth  rate  is  maximum  for  spheromaks  with  /3 
between  0  and  6%  where  /?  =  ^  100%.  The  normalized  growth 

rate  obtained  was  774  =  0.20015. 

The  nonlinear  simulation  with  our  code  used  a  non-simply  connected 
grid  made  of  ten  blocks,  each  with  15  x  15  x  20  cells.  (See  Figure  31.)  This 
particular  type  of  grid  was  chosen  since  it  has  only  four  pairs  of  cells  with 
high  aspect  ratio.  Alternative  grid  options  are  a  pie  slice  grid  which  has  high 
aspect  ratio  cells  all  around  the  circumference  and  has  a  singular  point  at 
the  axis  and  a  distorted  square  grid  which  has  high  aspect  ratio  cells  around 
the  circumference.  These  alternative  grids  are  shown  in  Figure  32. 

The  initial  magnetic  field  is  obtained  from  the  force- free  equilibrium. 

Br  =  —kziiikrv)  cos{kzz) 

Be  =  AoJi(A:rr)  s\n{kzz) 

Bz  =  fcrJo(^r^)  sm{kzz) 

where  krR  =  3.832,  kzL  =  tt  and  Aq  =  \/k^  +  Uniform  density  and 
pressure  profiles  are  initialized  to  give  /?  =  8%.  The  initial  perturbation  is 
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Figure  31:  The  ten  block  grid  used  for  the  spheromak  simulation.  Some  of 
the  blocks  has  been  removed  for  illustration. 

the  velocity  field  obtained  from  the  linear  code  and  interpolated  to  the  three 
dimensional  grid.  Figure  33  shows  the  initial  velocity  field. 

Our  first  goal  was  to  match  the  growth  rate  obtained  from  the  linear 
code  at  early  time.  Since  our  code  is  nonlinear,  the  size  of  the  initial  per¬ 
turbation  is  critical.  We  found  that  if  the  initial  perturbation  is  too  large 
i'^max  ^  then  the  spheromak  structure  is  destroyed.  If  the  ini¬ 

tial  perturbation  is  too  small  the  algorithm  produces  flows  which  are  larger 
than  the  initial  perturbation.  These  spurious  flows  were  the  result  of  the 
generalized  coordinate  formulation  which  are  being  corrected  by  using  the 
unaligned  finite  volume  formulation  described  in  Section  2.2.7. 

For  an  initial  perturbation  of  =  5.5%?;^  the  spheromak  tilts  and 
gives  encouraging  results  as  shown  in  Figure  34.  When  the  tilt  instability 
saturates,  the  plasma  axis  is  not  perpendicular  to  its  original  orientation. 
The  final  state  is  oriented  so  that  the  plasma  has  expanded  into  the  corners 
of  the  flux  conserver  which  further  minimizes  the  magnetic  energy.  Figure  35 
shows  the  growth  of  the  kinetic  energy  with  time.  The  linear  growth  rate  has 
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Figure  32:  Two  alternate  grids  that  can  be  used  for  simulations  of  cylindrical 
configurations,  (a)  The  pie  slice  grid  has  large  aspect  ratio  cells  along 
circumference  and  a  singularity  at  the  axis,  (b)  The  distorted  square  grid 
has  high  aspect  ratio  cells  along  the  circumference. 


also  been  plotted  for  comparison.  The  rate  of  growth  is  limited  by  the  linear 
growth  rate  as  expected.  The  initial  decrease  in  kinetic  energy  is  related  to 
the  spurious  flow  produced  by  the  generalized  coordinate  formulation. 

3.8  MPD  Plasma  Thruster 

A  plasma  thruster  uses  the  j  x  B  body  force  to  accelerate  the  plasma.  Ad¬ 
vantages  of  this  type  of  thrusters  over  chemical  thrusters  are  a  higher  specific 
impulse  and  higher  efficiency.  The  higher  specific  impulse  leads  to  savings 
in  propellant  mass  for  a  mission  with  a  specified  AV. 

We  have  studied  the  magnetoplasmadynamic  (MPD)  plasma  thrusters. 
Our  goal  is  to  validate  the  code  against  computational  and  experimental 
results  of  Slezidna  er  al[27]  and  to  improve  the  thruster  design. 

We  have  studied  channel  MPD  thrusters  successfully  during  tests  of  our 
code.  The  current  work  modeled  the  more  realistic  annular  MPD  thrusters. 
In  an  annular  MPD  thruster  a  current  is  driven  through  the  plasma  radially 
by  coaxial  electrodes  (the  anode  is  at  the  larger  radius).  The  self-generated 
magnetic  field  and  the  current  give  the  j  x  B  force  that  accelerate  the  plasma. 

A  Hall  plasma  thruster  also  has  a  coaxial  configuration.  A  radial  mag¬ 
netic  field  and  an  axial  electric  field  are  applied  which  produces  an  azimuthal 
current  by  the  Hall  Effect.  The  Hall  current  is  created  by  electron  drifting 
azimuthally  at  a  high  speed.  The  electrons  ionize  the  injected  gas  propellant 
to  form  a  plasma.  Then  the  Hall  current  interacts  with  the  radial  magnetic 
field  to  accelerate  the  plasma  axially. 
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Figure  33:  Initial  velocity  field.  Contours  represent  the  magnitude  of  the 
toroidal  velocity  component  and  the  vectors  represent  the  poloidal  velocity. 
Velocity  field  is  normalized  with  respect  to  the  Alfven  speed. 


Our  simulation  uses  a  grid  composed  of  four  blocks  (see  Figure  36)  in 
an  axisymmetric  configuration.  The  electrodes  are  modeled  as  perfect  con¬ 
ductors.  The  inlet  gas  was  injected  at  a  pressure  that  is  5%  higher  than  the 
initial  pressure  inside  the  thruster.  The  expansion  region  beyond  the  exit 
of  the  thruster  is  also  modeled  to  examine  the  plume  of  the  thruster.  The 
thruster  current  was  held  constant  at  IkA,  The  plasma  temperature  was 
O.SfceV. 

Results  are  encouraging  and  further  investigation  is  required  in  order  to 
validate  the  code  against  the  mentioned  experimental  results.  In  Figure  37 
we  present  the  velocity  vectors  and  contours  of  magnetic  field  after  approx¬ 
imately  20  Alfven  transit  times.  The  radial  dependence  of  the  velocity  is 
caused  by  the  radial  dependence  of  the  j  x  B  accelerating  force.  A  vortex 
ring  is  observed  shedding  from  the  cathode,  in  the  upper  quarter  of  the  do- 
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Figure  34:  Contours  of  toroidal  magnetic  field  Be  through  a  cross-section  of 
the  spheromak  at  0  =  90^.  White  contours  represents  positive  values  and 
black  negative  values. 


main.  The  magnetic  field  balloons  beyond  the  annular  region  between  the 
electrodes  as  measured  in  experiments. 

Tests  showed  that  initialization  with  lower  pressures  (/?)  of  the  compu¬ 
tational  domain  gave  non-physical  results,  i.e.  negative  pressures,  after  a 
few  time  steps.  There  two  factors  that  might  lead  to  such  behavior.  One 
is  related  to  the  initialization,  specifically  the  sudden  application  of  the 
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Figure  35:  Evolution  of  the  kinetic  energy  of  the  spheromak  as  a  function 
of  normalized  time.  Note  the  nonlinear  behavior  at  the  beginning  of  the 
simulation. 

magnetic  field  to  the  stationary  plasma.  This  can  lead  to  vacuum  or  near 
vacuum  conditions  being  created  in  a  few  grid  cells  after  the  first  time  step. 
Further  mass  transport  can  lead  to  a  negative  pressure  situation.  We  are  in¬ 
vestigating  starting  the  simulation  with  a  finite  current  rise  time  (as  is  more 
physical).  The  applied  current  would  increase  until  the  operating  value  is 
reached. 

Another  source  of  the  errors  at  lower  pressures  is  the  numerical  overes¬ 
timation  of  the  wave  speeds  which  leads  to  the  depletion  of  mass  in  regions 
of  the  domain  and  finally  to  negative  pressures.  To  eliminate  this  effect  we 
are  following  the  work  by  Einfeldt  et  al.[28]  They  describe  the  calculation 
of  the  fiux  at  a  cell  interface  such  that  the  scheme  is  positively  conservative, 
cannot  lead  to  negative  pressure.  We  are  applying  their  scheme,  developed 
for  a  first  order  accurate  algorithm,  to  our  second  order  accurate  algorithm. 
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Figure  36:  (a)  Four  block  grid  for  the  axisymmetric  MPD  simulation  and  (b) 
the  initial  contours  of  the  magnetic  field.  (Black  contours  represent  higher 
values.) 

4  Parallel  Computer  Implementation 

We  have  begun  to  investigate  strategies  for  implementing  the  algorithm  on 
parallel  architectures.  The  first  of  the  following  sections  describes  our  first 
and  the  simplest  approach,  which  was  to  parallelize  the  LU-SGS  algorithm 
in  a  point-wise  manner.  This  proved  to  be  too  fine-grained  to  be  efficient,  so 
we  have  since  opted  for  a  domain  decomposition  approach  which  is  described 
in  the  second  section.  The  third  section  describes  the  implementation  of  this 
method  on  the  MHD  solver. 


Figure  37:  (a)  Flow  field  and  (b)  contours  of  azimuthal  magnetic  field. 
(Black  contours  represent  higher  values.)  Note  the  vortex  ring  shedding  off 
the  cathode. 

4.1  Fine-Grain  Pairallelization 

The  LU-SGS  algorithm  involves  a  double  sweep  of  the  computational  do¬ 
main.  The  forward  (predictor)  sweep  solves  a  lower  tridiagonal  block  matrix 
for  the  entire  computational  domain.  The  backward  (corrector)  sweep  solves 
an  upper  tridiagonal  block  matrix.  Figure  38  shows  the  form  of  the  lower 
and  upper  block  diagonal  matrices  for  the  case  of  a  4  x  5  grid.  Because  of 
the  lower-upper  form  of  the  matrices,  the  solutions  at  grid  cells  along  a  line 
of  constant  i-\-  j  are  independent. 

The  simplest  parallel  implementation  is  to  decompose  the  domain  into 
its  component  cells,  distribute  the  grid  cells  over  the  processors  of  the  par¬ 
allel  computer,  and  treat  each  cell  as  residing  on  a  different  processor.  This 
approach  exploits  the  independence  of  the  solutions  of  the  cells  on  lines 
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Figure  38:  The  20  x  20  lower  (a)  and  upper  (b)  tridiagonal  block  matrices 
for  the  LU-SGS  algorithm  with  a  grid  of  4  x  5  cells. 

of  constant  i  +  j.  Communication  between  the  cells  provides  the  neces¬ 
sary  synchronization.  For  these  tests,  we  used  the  Parallel  Virtual  Machine 
(PVM)  communication  library  which  was  developed  at  Oak  Ridge  National 
Laboratory.  [29]  PVM  allowed  us  to  connect  a  network  of  four  DEC  Alpha 
workstation  and  use  them  as  our  parallel  computer. 

To  determine  the  parallel  effectiveness,  we  measured  the  speedup  ob¬ 
tained  when  a  problem  grid  of  constant  size  was  evenly  distributed  onto  an 
increasing  number  of  processors.  Speedup  is  defined  as  the  time  required  to 
find  the  solution  with  n  processors  divided  by  the  time  with  one  processor. 
For  perfectly  parallel  implementations,  the  speedup  would  be  equal  to  the 
number  of  processors.  Any  commimication  time  and  processor  synchroniza¬ 
tion  decreases  the  speedup. 

We  used  a  4  x  4  grid  and  varied  the  number  of  processors  from  one 
to  four.  While  this  was  a  small  size  problem,  it  was  sufficient  to  test  the 
parallel  implementation.  The  speedup  results  are  shown  in  Figure  39.  Some 
speedup  can  be  seen;  however,  the  amount  is  unsatisfactory. 

The  low  efficiencies  indicate  that  the  simplest  approach  for  parallel  im¬ 
plementation  of  the  LU-SGS  algorithm  is  inadequate.  The  results  are  not 
surprising  since  the  grain  of  parallelization  in  this  approach  is  too  fine  and 
requires  excessive  communication.  The  number  of  grid  cells  in  practical 
applications  will  be  much  greater  than  the  number  of  processors.  This  sug¬ 
gests  dividing  the  domain  into  a  number  of  large  blocks,  so  that  the  grid 
cells  within  a  block  are  located  on  the  same  processor  (and  memory)  and 
do  not  need  to  communicate  through  message  passing. 
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Figure  39:  The  parallel  speedup  for  a  problem  with  constant  size  grid  using 
a  fine-grain  parallelization  approach. 

4.2  Coarse-Grain  Parallelization 

In  this  section,  we  describe  the  coarse-grain  parallelization  of  the  MHD 
solver  and  the  performance  of  this  approach  applied  to  a  real  problem. 

The  algorithm  was  parallelized  using  the  domain  decomposition  tech¬ 
nique  (DDT).  This  technique  is  based  on  the  simple  idea  of  “divide  and 
conquer”  The  integral  form  of  a  general  conservation  law  is 

^^JdVQ  +  fdS-  F(Q)  ^  jdV  S(Q),  (69) 

O  S  n 

where  f)  is  the  domain  and  E  is  the  boundary  of  Q  is  the  vector  of 
conserved  variables,  F{Q)  is  the  flux  of  the  conserved  variables,  and  S{Q) 
is  the  vector  of  source  terms.  By  splitting  the  domain  Q  into  p  subdomains 
such  that 

p 

Q  =  |Jni,  (70) 

i=l 

one  can  replace  eqn(69)  with  a  set  of  p  conservation  equations  applied  on 
the  subdomains  Cli. 

^^jdyQ  +  fdS-F{Q)  =  JdVS{Q),  i  =  l,2,...,p  (71) 

Oj  S^  Oj 

Each  of  these  discretized  equations  is  solved  by  a  single  processor.  Each 
processor  uses  the  boundary  values  copied  from  neighboring  subdomains. 
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Figure  40:  (a)  Strip  decomposition  and  (b)  patch  decomposition  of  a  2-D 
domain. 


4.2.1  Domain  Decomposition 

To  abstract  the  computer  architecture,  we  assume  that  a  set  of  p  processors 
can  be  assigned  to  run  the  code  and  that  these  processors  implement  a 
message  passing  system.  For  simplicity  the  original  domain  is  assumed  to 
be  a  square  of  size  nx  n. 

The  2-D  version  of  the  algorithm  was  parallelized.  There  are  two  tech¬ 
niques  available  for  the  decomposition  of  2-D  domains,  the  strip  decompo¬ 
sition  and  the  patch  decomposition  which  are  shown  in  Figure  40. 

Strip  decomposition  is  implemented  by  dividing  the  original  domain  in 
subdomains  of  n  x  and  it  might  be  thought  of  as  a  1-D  decomposition. 
With  strip  decomposition  each  subdomain  needs  to  exchange  data  with  two 
neighbors  except  the  subdomains  at  the  boundaries  of  the  original  domain 
which  communicate  with  only  one  neighbor. 

The  communication  time  for  an  interior  subdomain  was  defined  by  Zhu 
[30]  as 

TD,,=2{a  +  80n)  (72) 

where  a  is  the  communication  start-up  time,  (3  is  the  time  required  to  send 
one  byte  of  data,  and  the  8  means  that  the  data  are  represented  as  double 
precision  variables  (their  size  is  eight  bytes). 
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Figure  41:  Column  decomposition  of  a  S-D  domain  is  an  immediate  exten¬ 
sion  of  the  patch  decomposition  of  2-D  domains. 

Patch  decomposition  is  implemented  by  dividing  the  original  domain  in 
^  X  With  this  method  each  processor  has  to  commimicate  with  four 
neighbors  unless  it  is  situated  on  the  boundaries  of  the  original  domain  and 
it  has  two  or  three  neighbors.  For  simplicity  it  is  assumed  that  p  is  an  even 
square  number  and  n  is  evenly  divisible  by  The  communication  time 
for  an  internal  subdomain  is 

7b33=4((T  +  8/3^).  (73) 

For  a  fixed  grid  size,  decreases  with  the  number  of  processors  since 
in  eqn  (73)  the  number  of  processors  appear  at  the  numerator.  In  contrast, 
Td2i  stays  constant  with  the  number  of  processors. 

This  made  the  patch  decomposition  an  obvious  choice  for  our  implemen¬ 
tation.  The  technique  will  also  provide  a  straightforward  extension  to  the 
column  decomposition  of  3-D  domains  (see  Figure  41). 

4.2.2  Implementation  of  the  Patch  Decomposition 

The  programming  model  used  for  the  implementation  was  single  program 
multiple  data  (SPMD).  Each  processor  runs  the  same  code  on  the  data 
corresponding  to  its  subdomain.  One  processor  has  to  perform  the  domain 
decomposition  and  send  the  data  to  the  other  processors.  This  processor 
was  designated  as  the  main  task. 

Assuming  that  there  are  p  processors  available  for  running  the  code 
they  can  be  arranged  in  a  processor  grid  of  pr  x  pc  =  p  where  Pr  is  the 
number  of  rows  and  pc  is  the  number  of  columns.  The  size  of  the  original 
computational  domain  is  m  x  n.  It  is  possible  to  have  subdomains  of  equal 
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sizes  only  if  m  and  n  are  evenly  divisible  with  pc  and  pr  respectively.  The 
domain  decomposition  was  implemented  such  that  some  processors  receive 
an  extra  row  or  extra  column  if  m  and  n  are  not  divisible  by  Pc  and  pr. 

Physical  coupling  of  the  subdomains  is  accomplished  by  the  exchange 
of  internal  boundaries.  A  processor  sends  the  data  from  the  cells  next  to 
its  boundaries  to  the  neighboring  processors  if  they  exist.  The  receiving 
processor  assigns  the  received  data  to  the  cells  of  its  respective  boundaries. 
If  a  processor  does  not  have  a  neighbor  in  a  certain  direction  the  boundary 
conditions  are  applied  to  that  boundary.  Since  the  algorithm  uses  a  five- 
point  stencil  only  one  row/column  needs  to  be  exchanged. 

4.2.3  Message  Passing 

One  of  the  goals  of  the  project  is  to  develop  a  portable  code.  A  first  step 
in  assuring  the  portability  was  to  use  a  message  passing  system  commonly 
available  on  parallel  supercomputers  and  on  workstation  clusters.  This  sys¬ 
tem  is  the  Message  Passing  Interface  (MPI)[16],  which  was  adopted  as  a 
standard  in  May  1994  by  industry  and  academia.  Hardware  and  software 
vendors’  implementation  of  MPI  provides  parallel  program  developers  with 
a  consistent  set  of  subroutines  callable  from  FORTRAN??  and  C.  In  our 
code  we  made  use  of  the  basic  point-to-point  communications  subroutines 
and  global  communications  subroutines.  The  point-to-point  communica¬ 
tion  subroutines  were  used  for  the  domain  decomposition  and  boundary 
exchange  while  the  global  communication  subroutines  were  used  for  conver¬ 
gence  checking.  All  message  passing  systems  (PVM,  MPL)  support  point- 
to-point  and  global  communications  subroutines  so  that  by  using  only  the 
basic  set  we  provided  for  a  facile  portability  to  systems  not  supporting  MPL 

4.2.4  Load  Balancing 

The  load  balancing  for  this  code  is  performed  by  distributing  an  approx¬ 
imately  equal  number  of  cells  to  each  processor.  This  is  accomplished  by 
the  main  task  during  the  domain  decomposition  phase.  Since  the  number  of 
fioating  point  operations  performed  by  each  processor  is  the  same,  a  static 
domain  decomposition  is  sufficient  to  ensure  that  the  processors  have  an 
equal  share  of  the  computing  load.  If  the  code  takes  were  to  allow  for  time- 
dependent  ionization  or  other  localized  phenomena  which  require  additional 
operations  in  a  limited  region  of  the  computational  domain,  then  a  dynamic 
load  balancing  procedure  may  be  necessary.  A  simple  algorithm  for  dynamic 
load  balancing  is  the  masked  multiblock  described  by  Borrelli  et  al.[31]  We 
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Speedup  vs.  Number  of  Processors 


Number  of  Processors 


Figure  42:  Fixed  grid  (400  x  80)  speedup  results.  Note  the  superlinear 
speedup  of  the  explicit  mode. 


will  implement  the  masking  algorithm  in  future  versions  of  the  code  if  it 
becomes  necessary. 

4.2.5  Results 

In  order  to  measure  the  performance  of  the  code  we  applied  the  parallel 
version  to  the  plasma  gun  problem  described  in  Section  3.3.  The  paral¬ 
lel  version  was  checked  against  the  sequential  version,  and  both  produced 
identical  results. 

There  are  two  criteria  generally  used  for  the  performance  analysis  of 
parallel  codes:  (1)  the  speedup  Sp  =  Tseq/Tp  and  (2)  the  efficiency  Ep  = 
Sp/p^  where  Tgeq  is  the  time  needed  for  the  best  sequential  algorithm  to 
complete  the  task  and  Tp  is  the  time  needed  for  the  parallel  algorithm  run 
on  a  number  of  p  processors  to  complete  the  same  task.  Note  that  the 
definition  of  speedup  used  here  is  more  rigorous  and  meaningful  than  the 
one  commonly  used  since  it  is  based  on  the  sequential  version  and  not  the 
parallel  version  on  one  processor. 

We  ran  the  parallel  code  on  the  IBM  SP2  with  a  fixed  grid  of  size  400  x  80 
on  a  processor  pool  of  varying  size:  4,  8,  16,  32  and  64  processors.  The 
speedup  for  the  explicit  and  implicit  modes  is  shown  in  Figure  42.  As 
expected  the  speedup  increases  with  the  number  of  processors  assigned  to 
run  the  code.  For  the  explicit  mode  the  speedup  is  superlinear,  which  seems 
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to  contradict  Amdahl’s  law 


5p  = - - •  (74) 

rp  I  Z-/»— 1  computation, i 

J- communication  >  p 

Assuming  that  no  time  is  used  for  communication  and  that  the  sum  of  the 
computation  time  for  all  processors  is  equal  to  the  sequential  computation 
time,  the  maximum  speedup  is  linear  (for  p  processors  the  speedup  is  p). 
However,  Amdahl’s  law  does  not  take  in  consideration  the  architecture  of  the 
system  used,  in  particular  the  cache  effects.  On  the  IBM  RS/6000  machines, 
which  constitute  the  nodes  of  the  SP2,  the  data  is  passed  from  the  main 
memory  to  the  CPU  through  a  data  cache.  A  data  cache  miss  involves 
a  delay  of  eight  CPU  cycles  while  the  data  in  the  cache  can  be  accessed 
in  one  cycle [32].  Noting  that  an  add  and  multiply  operation  (a  FLOP) 
takes  one  CPU  cycle  the  conclusion  is  that  a  data  cache  miss  decreases  the 
performance  significantly.  By  increasing  the  number  of  processors  in  the 
pool  and  keeping  the  overall  problem  size  constant,  we  reduced  the  amount 
of  data  assigned  to  a  processor.  Its  data  cache  could  hold  more  data  thus 
reducing  the  number  of  cache  misses  and  improving  the  performance,  which 
explains  the  superlinear  speedup.  The  same  behavior  was  observed  by  Michl 
et  al.,  on  a  cluster  of  IBM  RS/6000/500  workstations. [33] 

The  speedup  for  the  explicit  mode  is  higher  than  that  for  the  implicit 
mode  because  the  implicit  mode  is  the  more  computationally  intensive  and 
is,  therefore,  less  sensitive  to  cache  misses.  One  has  to  be  careful  when 
comparing  the  results  presented  in  Figure  42  since  the  number  of  iterations 
until  convergence  is  reached  for  the  implicit  mode  depends  on  the  number 
of  processors  used. 

The  trend  of  the  speedup  shows  an  increasing  slope  for  both  explicit 
and  implicit  modes  which  indicates  that  the  code  is  far  from  communication 
saturation.  Saturation  occurs  when  the  time  spent  on  communications  be¬ 
comes  comparable  with  the  computation  time.  If  the  number  of  processors 
is  increased  and  the  size  of  the  subdomains  becomes  smaller,  each  processor 
will  have  fewer  computations  to  perform,  but  the  total  time  spent  in  ex¬ 
changing  the  data  on  the  boundaries  will  increase.  The  total  time  spent  for 
boundary  exchange  can  be  found  using  the  formula  for  the  communication 
time  for  an  internal  subdomain  [eqn(73)]  and  multiplying  it  with  the  number 
of  processors  in  a  pool  p, 

Tbdry  exch  =  PTdjj  =  4(<Tp  -|-  S^Uy/p).  (75) 

The  total  time  spent  on  boundary  exchanges  varies  proportionally  with  p. 
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Figure  43:  Scaled  grid  (50  x  10  per  processor)  speedup  results. 


For  the  processor  pools  with  a  non-square  number  of  processors  we  have 
run  the  code  on  grids  organized  as  PrXpc  and  the  transpose  Pc'XPn  so  that 
the  number  of  row  cells  versus  column  cells  changed.  The  results  showed 
that  a  decomposition  whose  subdomains  have  longer  rows  performs  better 
than  one  with  longer  columns.  This  is  consistent  with  the  data  cache  misses 
that  were  observed  previously.  An  improvement  of  20-30%  in  the  measured 
speedup  was  obtained  by  modifying  the  domain  partitions.  It  should  be 
noted  that  this  result  is  particular  to  IBM  architecture,  and  the  dependency 
of  the  obtained  speedup  on  domain  decomposition  will  vary  on  other  archi¬ 
tectures.  The  speedup  results  shown  in  Figure  42  for  8  and  32  processors 
have  been  averaged. 

In  order  to  eliminate  the  cache  effects  from  the  performance  analysis  we 
ran  the  code  on  grids  that  scaled  with  the  number  of  processors.  The  size  of 
the  grid  on  each  processor  remained  constant.  As  the  number  of  processors 
was  increased,  the  grid  increased  proportionally.  The  speedup  results  are 
presented  in  Figure  43.  Again  note  that  the  speedup  is  measured  relative 
to  the  sequential  version  of  the  code  and  not  the  parallel  version  run  on  a 
single  processor. 

The  speedup  for  a  perfectly  parallel  code  for  the  scaled  grid  is  unity  for 
any  number  of  processors.  Our  results  show  a  speedup  that  is  less  than 
unity  and  it  decreases  with  the  number  of  processors.  This  is  an  expected 
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result  since  the  total  communication  time  increases  with  the  number  of 
processors.  Since  the  slope  of  the  speedup  is  gradual  and  it  appears  to 
flatten,  we  conclude  that  the  parallel  code  performs  satisfactorily  on  scaled 
grids. 


5  Professional  Interactions 


5.1  Project  Personnel 


The  personnel  who  have  been  directly  involved  in  this  project  are  listed 
below. 


Name 

Uri  Shumlak 
D.  Scott  Eberhardt 
Thomas  R.  Jarboe 
Byoungsoo  Kim 
Julian  Becerra-Sagredo 
Ogden  S.  Jones 
R.  Scott  Raber 
David  Taflin 
Bogdan  Udrea 


Position 

Research  Assistant  Professor 
Associate  Professor 
Professor 

Research  Associate 
Graduate  Student 
Graduate  Student 
Graduate  Student 
Graduate  Student 
Graduate  Student 


5,2  Collaborations 

5.2.1  Air  Force  Resesirch  Laboratory 

Dr.  Robert  Peterkin  and  Dr.  Thomas  Hussey  of  the  Electromagnetic  Sources 
Division  on  three-dimensional  multigrid  algorithms  for  MACHS,  develop¬ 
ment  of  a  parallel  PIC  (particle  in  cell)  code  for  microwave  simulations,  and 
stabilization  of  the  the  Rayleigh- Taylor  instability  in  solid  liner  implosions 
by  introducing  a  sheared  axial  flow. 


5.2.2  National  Oceanic  and  Atmospheric  Administration 

Dr.  Kris  Murawski  of  the  Space  Environment  Center  on  applying  our  code 
to  study  solar  wind  and  corona.  They  currently  have  a  copy  of  our  code  for 
this  application. 


5.2.3  Lawrence  Livermore  National  Laboratory 

Dr.  Charles  Hartman  of  the  Magnetized  Plasmas  Division  on  stabilization 
of  the  z-pinch  using  sheared  axial  flows.  This  collaboration  resulted  in  the 
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publication  listed  below. 

5.2.4  University  of  Michigan 

Prof.  Bram  van  Leer,  Prof.  Kenneth  Powell,  and  Prof.  Philip  Roe  of  the 
Aerospace  Engineering  Department  on  approximate  Riemann  solvers  for  the 
MHD  plasma  model,  in  particular,  eigenvector  normalizations  and  Roe  av¬ 
erages. 

5.2.5  University  of  Colorado 

Prof.  Steve  McCormick  of  the  Applied  Math  Department  on  three-dimensional 
multigrid  algorithms. 

5.2.6  New  Mexico  Institute  of  Technology 

Prof.  Steve  Schaffer  of  the  Applied  Math  Department  on  three-dimensional 
multigrid  algorithms  for  anisotropic  equations. 

5.2.7  University  of  Washington 

Prof.  Randy  LeVeque  of  the  Applied  Math  Department  on  approximate 
Riemann  solvers  and  their  applications  to  multidimensional  problems. 

5.3  Transitions 

Dr.  Kris  Murawski  at  the  Space  Environment  Center  of  the  National  Oceanic 
and  Atmospheric  Administration  requested  and  was  granted  a  copy  of  our 
code.  His  group  will  use  our  code  to  study  solar  wind  and  corona. 

5.4  Publications 

A  journal  article  describing  our  algorithm  has  been  published  in  the  Jour¬ 
nal  of  Computational  Physics.  The  title  is  “An  Implicit  Scheme  for  Nonideal 
Magnetohydrodynamics”  by  O.  S.  Jones,  U.  Shumlak,  and  D.  S.  Eberhardt.[17] 
The  citation  is  Journal  of  Computational  Physics  130,  231  (1997).  We  have 
two  papers  that  have  been  published  in  conference  proceedings.  “Physics  of 
the  Hall  Thruster,”  U.  Shumlak,  T.  R.  Jarboe,  and  R.  A.  Sprenger,  AIAA 
97-3048  (1997),  at  the  1997  Joint  Propulsion  Conference  in  Seattle,  Wash¬ 
ington.  “A  Portable  Parallel  Implicit  Approximate  Riemann  Solver  for  Non- 
Ideal  Magnetohydrodynamics,”  B.  Udrea,  O.  S.  Jones,  U.  Shumlak,  and  D. 
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S.  Eberhardt,  AIAA  97-0366  (1997),  at  the  Aerospace  Science  Meeting  in 
Reno,  Nevada. 

Two  journal  articles  resulting  from  the  collaboration  with  the  Air  Force 
Research  Laboratory  and  Lawrence  Livermore  National  Laboratory  were 
published.  One  article  is  titled  “Sheared  Flow  Stabilization  of  the  m  =  1 
Kink  Mode  in  Z-Pinches”  by  U.  Shumlak  and  C.  W.  Hartman.  The  citation 
is  Physical  Review  Letters  75  (18),  3285  (1995).  The  other  article  is  titled 
“Mitigation  of  the  Rayleigh-Taylor  Instability  by  Sheared  Axial  Flows”  by 
U.  Shumlak  and  N.  F.  Roderick.  The  citation  is  Physics  of  Plasmas  5  (6), 
2384  (1998). 

A  journal  article  has  been  submitted  for  publication  in  the  AIAA  Journal 
of  Propulsion  and  Power.  The  article  is  titled,  “A  Simple  Model  of  Flow 
Evolution  in  the  Hall  Thruster”  by  U.  Shumlak,  T.  R.  Jarboe,  and  R.  A. 
Sprenger. 

5.5  Presentations 

A  paper  discussing  the  magnetic  reconnection  results  was  presented  at  the 
Thirty-Seventh  Annual  American  Physical  Society  Meeting  of  the  Division 
of  Plasma  Physics,  Louisville,  Kentucky,  November  1995.  “Time-Dependent 
Calculations  of  Resistive  Tearing  Instabilities  Using  a  New  Implicit  MHD 
Solver.” 

A  paper  presenting  the  findings  of  the  stabilization  of  the  z-pinch  by 
sheared  axial  flows  was  presented  at  the  Thirty-Seventh  Annual  American 
Physical  Society  Meeting  of  the  Division  of  Plasma  Physics,  Louisville,  Ken¬ 
tucky,  November  1995.  “Sheared  Flow  Stabilization  of  the  m  =  1  Kink  Mode 
in  Z-Pinches.” 

A  paper  was  presented  at  the  Thirty-Eighth  Annual  American  Physi¬ 
cal  Society  Meeting  of  the  Division  of  Plasma  Physics,  Denver,  Colorado, 
November  1996.  The  title  was  “Computer  Simulations  of  Plasma  Acceler¬ 
ators,”  by  B.  Udrea,  O.  S.  Jones,  and  U.  Shumlak.  A  paper  was  presented 
at  the  Thirty-Fifth  AIAA  Aerospace  Science  Meeting,  Reno,  Nevada,  Jan¬ 
uary  1997.  The  title  was  “A  Portable  Parallel  Implicit  Approximate  Rie- 
mann  Solver  for  Non-Ideal  Magnetohydrodynamics,”  by  B.  Udrea,  O.  S. 
Jones,  U.  Shumlak,  and  D.  S.  Eberhardt.  A  paper  was  also  presented  at  the 
Twenty-Fourth  Annual  IEEE  International  Conference  on  Plasma  Sciences, 
San  Diego,  California,  May  1997.  The  title  was  “An  Advanced  Implicit 
Algorithm  for  MHD  Computations  on  Parallel  Architectures,”  by  U.  Shum¬ 
lak,  D.  S.  Eberhardt,  O.  S.  Jones,  and  B.  Udrea.  A  paper  was  presented  at 
the  Thirty-Third  AIAA/ASME/ASEE  Joint  Propulsion  Conference,  Seat- 
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tie,  Washington,  July  1997.  The  title  was  “Physics  of  the  Hall  Thruster,” 
by  U.  Shumlak,  T.  R.  Jarboe,  and  R.  A.  Sprenger. 


6  Conclusions 

The  successful  development  of  the  three-dimensional  advanced  implicit  algo¬ 
rithm,  the  implementation  of  the  algorithm  on  arbitrarily  connected  multi¬ 
block  grids,  and  the  practical  applications  indicate  that  this  project  is  ex¬ 
ceeding  its  objectives.  The  research  from  this  project  has  been  published  in 
a  refereed  journal  and  presented  at  international  conferences. 

Valuable  collaborations  have  been  formed  with  the  Air  Force  Research 
Laboratory,  Lawrence  Livermore  National  Laboratory,  and  other  universi¬ 
ties. 

The  continuing  development  of  this  project  will  include  adding  an  equa¬ 
tion  of  state  package,  investigating  powerful  implicit  matrix  inversion  meth¬ 
ods,  treating  the  destabilizing  Hall  effect,  and  applying  the  code  to  plasma 
experiments  to  calibrate  the  code  and  gain  physical  insight  into  devices  that 
are  important  to  the  Air  Force,  such  as  the  magnetic  flux  compression  gen¬ 
erator  (MCG). 
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